<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Randomized Controlled Trial (RCT) | Carlos Mendez</title><link>https://carlos-mendez.org/category/randomized-controlled-trial-rct/</link><atom:link href="https://carlos-mendez.org/category/randomized-controlled-trial-rct/index.xml" rel="self" type="application/rss+xml"/><description>Randomized Controlled Trial (RCT)</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Carlos Mendez</copyright><lastBuildDate>Tue, 24 Mar 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>Randomized Controlled Trial (RCT)</title><link>https://carlos-mendez.org/category/randomized-controlled-trial-rct/</link></image><item><title>Evaluating a Cash Transfer Program (RCT) with Panel Data in Stata</title><link>https://carlos-mendez.org/post/stata_rct/</link><pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/stata_rct/</guid><description>&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>Cash transfer programs are among the most common development interventions worldwide. Governments and international organizations spend billions of dollars each year providing direct cash transfers to low-income households. But how do we rigorously evaluate whether these programs actually work? This tutorial walks through the complete workflow of analyzing a &lt;strong>randomized controlled trial (RCT)&lt;/strong> with &lt;strong>panel data&lt;/strong> in Stata &amp;mdash; from verifying that randomization succeeded, to estimating treatment effects using increasingly sophisticated methods, to comparing results across all approaches.&lt;/p>
&lt;p>We use simulated data from a hypothetical cash transfer program targeting 2,000 households in a developing country. The key advantage of simulated data is that we know the &lt;strong>true treatment effect&lt;/strong> before we begin: the program increases household consumption by &lt;strong>12%&lt;/strong> (0.12 log points). This known ground truth gives us a perfect benchmark to evaluate how well each econometric method recovers the correct answer.&lt;/p>
&lt;p>The tutorial progresses from simple to sophisticated. We start with basic balance checks, then estimate treatment effects three different ways using only endline data &amp;mdash; regression adjustment (RA), inverse probability weighting (IPW), and doubly robust (DR) methods. Next, we unlock the full power of panel data with difference-in-differences (DiD) and its doubly robust extension (DRDID). Finally, we address the real-world complication of imperfect compliance.&lt;/p>
&lt;h3 id="learning-objectives">Learning objectives&lt;/h3>
&lt;ul>
&lt;li>Verify baseline balance using t-tests, standardized mean differences, and balance plots&lt;/li>
&lt;li>Distinguish between ATE and ATT and identify which estimand each method targets&lt;/li>
&lt;li>Understand three estimation strategies &amp;mdash; regression adjustment, inverse probability weighting, and doubly robust &amp;mdash; and when to use each&lt;/li>
&lt;li>Estimate treatment effects using all three approaches and compare their results&lt;/li>
&lt;li>Leverage panel data structure with difference-in-differences and understand why DiD estimates ATT&lt;/li>
&lt;li>Apply doubly robust difference-in-differences (DRDID) for modern panel data analysis&lt;/li>
&lt;li>Separate the effect of treatment offer from treatment receipt under imperfect compliance&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-study-design">2. Study design&lt;/h2>
&lt;p>This RCT evaluates a cash transfer program designed to boost household consumption. The study tracks 2,000 households across two survey waves &amp;mdash; a &lt;strong>baseline&lt;/strong> in 2021 (before the program) and an &lt;strong>endline&lt;/strong> in 2024 (after the program was implemented). The diagram below summarizes the experimental design.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
POP[&amp;quot;&amp;lt;b&amp;gt;2,000 Households&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Balanced panel&amp;lt;br/&amp;gt;(observed in 2021 and 2024)&amp;quot;]
STRAT[&amp;quot;&amp;lt;b&amp;gt;Stratified Randomization&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Within poverty strata&amp;quot;]
TRT[&amp;quot;&amp;lt;b&amp;gt;Treatment Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(~1,000 households)&amp;lt;br/&amp;gt;Offered cash transfer&amp;quot;]
CTL[&amp;quot;&amp;lt;b&amp;gt;Control Group&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;(~1,000 households)&amp;lt;br/&amp;gt;No offer&amp;quot;]
COMP1[&amp;quot;85% receive&amp;lt;br/&amp;gt;the transfer&amp;quot;]
COMP2[&amp;quot;15% do not&amp;lt;br/&amp;gt;receive&amp;quot;]
COMP3[&amp;quot;5% receive&amp;lt;br/&amp;gt;the transfer&amp;quot;]
COMP4[&amp;quot;95% do not&amp;lt;br/&amp;gt;receive&amp;quot;]
BASE[&amp;quot;&amp;lt;b&amp;gt;Baseline 2021&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Pre-treatment survey&amp;quot;]
END[&amp;quot;&amp;lt;b&amp;gt;Endline 2024&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Post-treatment survey&amp;quot;]
POP --&amp;gt; BASE
BASE --&amp;gt; STRAT
STRAT --&amp;gt; TRT
STRAT --&amp;gt; CTL
TRT --&amp;gt; COMP1
TRT --&amp;gt; COMP2
CTL --&amp;gt; COMP3
CTL --&amp;gt; COMP4
COMP1 --&amp;gt; END
COMP2 --&amp;gt; END
COMP3 --&amp;gt; END
COMP4 --&amp;gt; END
style POP fill:#6a9bcc,stroke:#141413,color:#fff
style STRAT fill:#d97757,stroke:#141413,color:#fff
style TRT fill:#00d4c8,stroke:#141413,color:#141413
style CTL fill:#6a9bcc,stroke:#141413,color:#fff
style BASE fill:#6a9bcc,stroke:#141413,color:#fff
style END fill:#d97757,stroke:#141413,color:#fff
style COMP1 fill:#00d4c8,stroke:#141413,color:#141413
style COMP2 fill:#141413,stroke:#d97757,color:#fff
style COMP3 fill:#d97757,stroke:#141413,color:#fff
style COMP4 fill:#141413,stroke:#6a9bcc,color:#fff
&lt;/code>&lt;/pre>
&lt;p>The randomization was &lt;strong>stratified by poverty status&lt;/strong> (block randomization), ensuring that treatment and control groups started with similar proportions of poor and non-poor households. A critical real-world feature of this study is &lt;strong>imperfect compliance&lt;/strong> &amp;mdash; only 85% of households offered the treatment actually received the cash transfer, while 5% of control households received it through other channels.&lt;/p>
&lt;h3 id="variables">Variables&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Type&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>id&lt;/code>&lt;/td>
&lt;td>Household identifier&lt;/td>
&lt;td>Panel ID&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>year&lt;/code>&lt;/td>
&lt;td>Survey year (2021 or 2024)&lt;/td>
&lt;td>Time variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>post&lt;/code>&lt;/td>
&lt;td>Endline indicator (1 = 2024)&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>treat&lt;/code>&lt;/td>
&lt;td>Random assignment to offer (intent-to-treat)&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>D&lt;/code>&lt;/td>
&lt;td>Actual receipt of cash transfer&lt;/td>
&lt;td>Binary (endogenous)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>y&lt;/code>&lt;/td>
&lt;td>Log monthly consumption&lt;/td>
&lt;td>Continuous (outcome)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>age&lt;/code>&lt;/td>
&lt;td>Age of household head&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>female&lt;/code>&lt;/td>
&lt;td>Female-headed household&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>poverty&lt;/code>&lt;/td>
&lt;td>Poverty status at baseline&lt;/td>
&lt;td>Binary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>edu&lt;/code>&lt;/td>
&lt;td>Years of education&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>y0&lt;/code>&lt;/td>
&lt;td>Log monthly consumption at baseline (pre-treatment)&lt;/td>
&lt;td>Continuous&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;blockquote>
&lt;p>&lt;strong>Offer vs. receipt&lt;/strong> &amp;mdash; The variable &lt;code>treat&lt;/code> captures random assignment to the program offer. It is exogenous (determined by randomization) and unrelated to household characteristics. The variable &lt;code>D&lt;/code> captures actual receipt of the cash transfer. It is &lt;strong>endogenous&lt;/strong> &amp;mdash; households that chose to take up the program may differ systematically from those that did not. Most methods in this tutorial estimate the effect of the &lt;strong>offer&lt;/strong> (intent-to-treat). Section 10 addresses the effect of &lt;strong>receipt&lt;/strong>.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="3-analytical-roadmap">3. Analytical roadmap&lt;/h2>
&lt;p>The diagram below shows the progression of methods we will use. Each stage builds on the previous one, adding complexity and robustness.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;&amp;lt;b&amp;gt;Balance&amp;lt;br/&amp;gt;Checks&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 5&amp;lt;/i&amp;gt;&amp;quot;]
B[&amp;quot;&amp;lt;b&amp;gt;Cross-sectional&amp;lt;br/&amp;gt;RA / IPW / DR&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Sections 7--8&amp;lt;/i&amp;gt;&amp;quot;]
C[&amp;quot;&amp;lt;b&amp;gt;Panel Data&amp;lt;br/&amp;gt;DiD / DR-DiD&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 9&amp;lt;/i&amp;gt;&amp;quot;]
D[&amp;quot;&amp;lt;b&amp;gt;Endogenous&amp;lt;br/&amp;gt;Treatment&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Section 10&amp;lt;/i&amp;gt;&amp;quot;]
A --&amp;gt; B
B --&amp;gt; C
C --&amp;gt; D
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#00d4c8,stroke:#141413,color:#141413
style D fill:#141413,stroke:#d97757,color:#fff
&lt;/code>&lt;/pre>
&lt;p>We first establish that randomization worked (balance checks). Then we estimate treatment effects three ways using only endline data &amp;mdash; regression adjustment, inverse probability weighting, and doubly robust methods. Next, we leverage the full panel structure with difference-in-differences. Finally, we address imperfect compliance by separating the effect of the offer from the effect of receipt.&lt;/p>
&lt;hr>
&lt;h2 id="4-data-loading-and-exploration">4. Data loading and exploration&lt;/h2>
&lt;p>We begin by loading the simulated dataset from a public GitHub repository and examining its structure.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
des y age edu female poverty treat D
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Contains data
Observations: 4,000
Variables: 10
Variable Storage Display Value
name type format label Variable label
─────────────────────────────────────────────────────────────
y float %9.0g Log monthly consumption
age float %9.0g
edu float %9.0g
female float %9.0g
poverty float %9.0g
treat float %9.0g Assignment to offer (Z)
D float %9.0g Receipt of cash transfer
&lt;/code>&lt;/pre>
&lt;p>The dataset contains 4,000 observations &amp;mdash; 2,000 households observed at two time points (baseline 2021 and endline 2024). The outcome variable &lt;code>y&lt;/code> is log monthly consumption, &lt;code>treat&lt;/code> is the random assignment indicator, and &lt;code>D&lt;/code> is the actual receipt indicator.&lt;/p>
&lt;p>Now let us examine summary statistics at baseline and endline separately.&lt;/p>
&lt;pre>&lt;code class="language-stata">sum y age edu female poverty treat D if post==0
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
─────────────+─────────────────────────────────────────────────────────
y | 2,000 10.0154 .4348886 8.454445 11.48253
age | 2,000 35.126 9.650839 18 68
edu | 2,000 12.0275 1.9889 6 18
female | 2,000 .5085 .5000528 0 1
poverty | 2,000 .3125 .4636283 0 1
treat | 2,000 .518 .4998009 0 1
D | 2,000 0 0 0 0
&lt;/code>&lt;/pre>
&lt;p>At baseline, mean log consumption is approximately 10.02, the average household head is 35 years old with 12 years of education, about 51% of households are female-headed, and 31% are in poverty. Treatment assignment (&lt;code>treat&lt;/code>) is approximately 50%, as expected from the randomization. Crucially, the receipt variable &lt;code>D&lt;/code> is zero for all households at baseline &amp;mdash; the program had not yet been implemented.&lt;/p>
&lt;pre>&lt;code class="language-stata">sum y age edu female poverty treat D if post==1
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Variable | Obs Mean Std. dev. Min Max
─────────────+─────────────────────────────────────────────────────────
y | 2,000 10.1137 .4382183 8.638689 11.55002
age | 2,000 35.126 9.650839 18 68
edu | 2,000 12.0275 1.9889 6 18
female | 2,000 .5085 .5000528 0 1
poverty | 2,000 .3125 .4636283 0 1
treat | 2,000 .518 .4998009 0 1
D | 2,000 .4615 .4986402 0 1
&lt;/code>&lt;/pre>
&lt;p>At endline, mean consumption has risen to approximately 10.11, reflecting both the natural time trend and the treatment effect. The receipt variable &lt;code>D&lt;/code> is now non-zero &amp;mdash; about 46% of all households received the cash transfer (combining treated households who took up the program and control households who received it through other channels).&lt;/p>
&lt;p>Finally, we declare the panel structure so Stata knows we have repeated observations.&lt;/p>
&lt;pre>&lt;code class="language-stata">xtset id year
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Panel variable: id (strongly balanced)
Time variable: year, 2021 to 2024, but with gaps
Delta: 1 unit
&lt;/code>&lt;/pre>
&lt;p>The panel is &lt;strong>strongly balanced&lt;/strong> &amp;mdash; all 2,000 households appear in both survey waves, with no attrition. This is an ideal scenario that simplifies our analysis.&lt;/p>
&lt;hr>
&lt;h2 id="5-baseline-balance-checks">5. Baseline balance checks&lt;/h2>
&lt;p>Before estimating any treatment effects, we must verify that randomization produced comparable treatment and control groups at baseline. This is the most fundamental quality check in any RCT.&lt;/p>
&lt;h3 id="51-t-tests-and-proportion-tests">5.1 T-tests and proportion tests&lt;/h3>
&lt;p>We compare the treatment and control groups on all baseline characteristics using two-sample t-tests for continuous variables and proportion tests for binary variables.&lt;/p>
&lt;pre>&lt;code class="language-stata">ttest y if post==0, by(treat)
ttest age if post==0, by(treat)
ttest edu if post==0, by(treat)
prtest female if post==0, by(treat)
prtest poverty if post==0, by(treat)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Variable | Control Mean Treat Mean Diff p-value
────────────+──────────────────────────────────────────────
y | 10.025 10.006 0.019 0.330
age | 35.335 34.931 0.404 0.350
edu | 11.974 12.077 -0.103 0.247
female | 0.484 0.531 -0.046 0.038 **
poverty | 0.307 0.318 -0.011 0.612
&lt;/code>&lt;/pre>
&lt;p>Most variables show no statistically significant differences between the treatment and control groups. However, the variable &lt;code>female&lt;/code> has a p-value of 0.038 &amp;mdash; a statistically significant imbalance. The treatment group has about 4.6 percentage points more female-headed households than the control group. This imbalance occurred purely by chance but must be addressed in our estimation.&lt;/p>
&lt;h3 id="52-balance-table-with-standardized-mean-differences">5.2 Balance table with standardized mean differences&lt;/h3>
&lt;p>P-values are sensitive to sample size &amp;mdash; a large sample can make tiny differences &amp;ldquo;significant.&amp;rdquo; Standardized mean differences (SMDs) provide a scale-free measure of imbalance that is more informative. The SMD is computed as the difference in group means divided by the pooled standard deviation &amp;mdash; this puts all variables on the same scale regardless of their units. The common rule of thumb is that SMDs below 10% indicate adequate balance.&lt;/p>
&lt;pre>&lt;code class="language-stata">capture ssc install ietoolkit, replace
iebaltab y age edu female poverty if post==0, grpvar(treat)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> (1) (2) (2)-(1)
Control Treatment Difference
y 10.025 10.006 0.019
(0.014) (0.014) (0.019)
age 35.335 34.931 0.404
(0.316) (0.295) (0.432)
edu 11.974 12.077 -0.103
(0.063) (0.063) (0.089)
female 0.484 0.531 -0.046**
(0.016) (0.016) (0.022)
poverty 0.307 0.318 -0.011
(0.015) (0.014) (0.021)
N 964 1,036
&lt;/code>&lt;/pre>
&lt;p>The balance table confirms our t-test findings. With 964 control and 1,036 treatment households, all variables are well balanced except &lt;code>female&lt;/code>, which shows a statistically significant difference (marked with **). The outcome variable &lt;code>y&lt;/code> has a negligible difference of 0.019 at baseline &amp;mdash; the groups started with essentially identical consumption levels.&lt;/p>
&lt;h3 id="53-visual-balance-plot">5.3 Visual balance plot&lt;/h3>
&lt;p>A balance plot provides a visual overview of all SMDs at once, making it easy to spot problematic variables.&lt;/p>
&lt;pre>&lt;code class="language-stata">net install balanceplot, from(&amp;quot;https://tdmize.github.io/data&amp;quot;) replace
balanceplot y age edu i.female i.poverty, group(treat) table nodropdv
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_balance_plot.png" alt="Balance plot showing standardized mean differences for all covariates. All variables fall within the 10% threshold, with female closest at approximately 9.3%.">&lt;/p>
&lt;p>The balance plot shows that all SMDs fall below the 10% threshold (indicated by the dashed vertical lines). The variable &lt;code>female&lt;/code> has the largest SMD at approximately 9.3% &amp;mdash; close to but still below the conventional threshold. The remaining variables &amp;mdash; consumption, age, education, and poverty &amp;mdash; all have SMDs well below 5%. Overall, randomization was successful, but we should control for &lt;code>female&lt;/code> (and other covariates) in our estimation to improve precision.&lt;/p>
&lt;h3 id="54-aipw-as-a-formal-balance-test">5.4 AIPW as a formal balance test&lt;/h3>
&lt;p>As a final and more formal balance check, we can use the Augmented Inverse Probability Weighting (AIPW) estimator on &lt;strong>baseline data only&lt;/strong>. If randomization was successful, the estimated &amp;ldquo;treatment effect&amp;rdquo; at baseline should be zero &amp;mdash; since the program had not yet been implemented, there should be no difference between groups.&lt;/p>
&lt;pre>&lt;code class="language-stata">preserve
keep if post==0
teffects aipw (y age edu i.female i.poverty) (treat age edu i.female i.poverty)
&lt;/code>&lt;/pre>
&lt;blockquote>
&lt;p>&lt;strong>Tip:&lt;/strong> The &lt;code>preserve&lt;/code> command saves a snapshot of the current data. After the balance analysis, use &lt;code>restore&lt;/code> to return to the full dataset. The companion do-file handles this automatically.&lt;/p>
&lt;/blockquote>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : augmented IPW
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | -.0244086 .018861 -1.29 0.196 -.0613754 .0125582
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.02792 .0138363 724.75 0.000 10.0008 10.05504
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The AIPW-estimated &amp;ldquo;ATE&amp;rdquo; at baseline is -0.024 with a p-value of 0.196 &amp;mdash; not statistically significant. This confirms that there is no detectable pre-treatment difference between the groups after adjusting for covariates. The treatment and control groups were statistically comparable before the program began.&lt;/p>
&lt;p>Now we run the diagnostic checks for the AIPW model.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance overid
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Overidentification test for covariate balance
H0: Covariates are balanced
chi2(5) = 3.216
Prob &amp;gt; chi2 = 0.6670
&lt;/code>&lt;/pre>
&lt;p>The overidentification test fails to reject the null hypothesis of covariate balance (p = 0.667). There is no statistical evidence of residual imbalance after weighting.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance summarize
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> |Standardized differences Variance ratio
| Raw Weighted Raw Weighted
----------------+------------------------------------------------
age | -.0417918 .0002505 .9318894 .9446877
edu | .0519015 -6.96e-06 1.071677 1.078214
female |
1 | .0929611 6.51e-06 .9970775 .9999996
poverty |
1 | .0226764 .0002864 1.018475 1.000233
&lt;/code>&lt;/pre>
&lt;p>The balance summary reveals that the raw standardized differences (before weighting) show the &lt;code>female&lt;/code> imbalance at 0.093, consistent with our earlier findings. After weighting, all standardized differences shrink to near zero (all below 0.001) &amp;mdash; excellent balance. The variance ratios are all close to 1.0, indicating similar spread across groups.&lt;/p>
&lt;pre>&lt;code class="language-stata">tebalance density y
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_density_y.png" alt="Density plot showing the distribution of log consumption for treatment and control groups, before and after AIPW weighting. The weighted distributions overlap almost perfectly.">&lt;/p>
&lt;p>The density plot confirms that after AIPW weighting, the distributions of log consumption in the treatment and control groups overlap almost perfectly. Any small pre-existing differences in the outcome variable have been eliminated by the weighting scheme.&lt;/p>
&lt;pre>&lt;code class="language-stata">teffects overlap
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="stata_rct_overlap_baseline.png" alt="Overlap plot showing kernel densities of estimated propensity scores for treatment and control groups. Both distributions span approximately 0.43 to 0.55 with substantial overlap.">&lt;/p>
&lt;p>The overlap plot shows that propensity scores for both groups are concentrated between approximately 0.43 and 0.55 &amp;mdash; well within the range where matching and weighting are feasible. There are no extreme propensity scores near 0 or 1, confirming that the common support condition is satisfied. This is expected in a well-designed RCT where treatment probability is approximately 0.50 for all households.&lt;/p>
&lt;pre>&lt;code class="language-stata">restore
&lt;/code>&lt;/pre>
&lt;p>This AIPW-based balance analysis also serves a pedagogical purpose: it introduces the concept of &lt;strong>doubly robust&lt;/strong> estimation before we use it for treatment effect estimation in Section 8.&lt;/p>
&lt;hr>
&lt;h2 id="6-what-are-we-estimating-ate-vs-att">6. What are we estimating? ATE vs. ATT&lt;/h2>
&lt;p>Before diving into estimation, we need to be precise about &lt;strong>what&lt;/strong> we are trying to estimate. There are two fundamental causal quantities in program evaluation.&lt;/p>
&lt;p>The &lt;strong>Average Treatment Effect (ATE)&lt;/strong> answers the policymaker&amp;rsquo;s question: &lt;em>&amp;ldquo;What would happen if we scaled this program to the entire population?&amp;quot;&lt;/em>&lt;/p>
&lt;p>$$ATE = E[Y(1) - Y(0)]$$&lt;/p>
&lt;p>where $Y(1)$ is the potential outcome under treatment and $Y(0)$ is the potential outcome under control, averaged over the &lt;strong>entire population&lt;/strong> (both treated and untreated).&lt;/p>
&lt;p>The &lt;strong>Average Treatment Effect on the Treated (ATT)&lt;/strong> answers the evaluator&amp;rsquo;s question: &lt;em>&amp;ldquo;Did the program benefit those who were assigned to it?&amp;quot;&lt;/em>&lt;/p>
&lt;p>$$ATT = E[Y(1) - Y(0) \mid T = 1]$$&lt;/p>
&lt;p>This averages the treatment effect only over the &lt;strong>treated group&lt;/strong> &amp;mdash; the households that were assigned to receive the cash transfer.&lt;/p>
&lt;p>In a well-designed RCT with &lt;strong>homogeneous treatment effects&lt;/strong> (the program affects everyone equally), ATE and ATT are the same. But when treatment effects are &lt;strong>heterogeneous&lt;/strong> (the program benefits some households more than others), they can differ. For example, if poorer households benefit more from cash transfers and the treatment group has a higher share of poor households, the ATT could be larger than the ATE.&lt;/p>
&lt;p>Understanding this distinction is critical because different methods target different estimands. Cross-sectional methods (RA, IPW, DR) can estimate &lt;strong>either&lt;/strong> ATE or ATT. Difference-in-differences inherently estimates the &lt;strong>ATT only&lt;/strong>. We will return to this point in Section 9.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note on RCTs&lt;/strong> &amp;mdash; In a randomized experiment, treatment assignment is independent of potential outcomes. This means that simple comparisons between treatment and control groups are already unbiased estimates of the ATE. When we add covariates (regression adjustment, IPW, doubly robust), we are not removing bias &amp;mdash; we are &lt;strong>improving precision&lt;/strong> by accounting for residual variation. This is different from observational studies, where covariate adjustment is needed to address confounding.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="7-three-strategies-for-causal-estimation">7. Three strategies for causal estimation&lt;/h2>
&lt;p>We now understand &lt;em>what&lt;/em> we want to estimate (ATE and ATT from Section 6). The question becomes &lt;em>how&lt;/em> to estimate it. Three families of methods exist, each taking a fundamentally different approach to solving the missing-data problem at the heart of causal inference. Each method models a different part of the data-generating process, and understanding these differences is essential for interpreting results and choosing the right tool.&lt;/p>
&lt;h3 id="71-regression-adjustment-ra-----modeling-the-outcome">7.1 Regression Adjustment (RA) &amp;mdash; modeling the outcome&lt;/h3>
&lt;p>Regression adjustment solves the missing-data problem by &lt;strong>predicting the unobserved potential outcomes&lt;/strong>. It fits separate regression models for treated and untreated groups. For each household, it uses these models to predict two potential outcomes: what consumption would be if treated, $\hat{\mu}_1(X_i)$, and what consumption would be if untreated, $\hat{\mu}_0(X_i)$. Since we only observe one of these for each household, the model fills in the missing counterfactual. The treatment effect for each household is the difference between the two predictions, and the ATE is the average across all households.&lt;/p>
&lt;p>The Stata documentation describes this succinctly: &lt;em>&amp;ldquo;RA estimators use means of predicted outcomes for each treatment level to estimate each POM. ATEs and ATETs are differences in estimated POMs.&amp;quot;&lt;/em>&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; predicting exam scores.&lt;/strong> Imagine two study methods (A and B) being tested on students. You observe each student using only one method. RA fits a model predicting test scores based on student characteristics (prior GPA, hours studied) separately for method-A and method-B users. Then, for &lt;em>every&lt;/em> student, it predicts what their score would have been under &lt;em>both&lt;/em> methods &amp;mdash; even the one they did not use. The average difference in predicted scores is the treatment effect.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Each household observed&amp;lt;br/&amp;gt;under ONE treatment only&amp;quot;]
M0[&amp;quot;&amp;lt;b&amp;gt;Fit outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;using control group&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Y = f(age, edu, female, poverty)&amp;lt;/i&amp;gt;&amp;quot;]
M1[&amp;quot;&amp;lt;b&amp;gt;Fit outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;using treated group&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Y = f(age, edu, female, poverty)&amp;lt;/i&amp;gt;&amp;quot;]
P0[&amp;quot;Predict &amp;lt;b&amp;gt;Ŷ₀&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;for ALL households&amp;quot;]
P1[&amp;quot;Predict &amp;lt;b&amp;gt;Ŷ₁&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;for ALL households&amp;quot;]
ATE[&amp;quot;&amp;lt;b&amp;gt;ATE&amp;lt;/b&amp;gt; = Average of&amp;lt;br/&amp;gt;(Ŷ₁ − Ŷ₀)&amp;quot;]
DATA --&amp;gt; M0
DATA --&amp;gt; M1
M0 --&amp;gt; P0
M1 --&amp;gt; P1
P0 --&amp;gt; ATE
P1 --&amp;gt; ATE
style DATA fill:#141413,stroke:#6a9bcc,color:#fff
style M0 fill:#6a9bcc,stroke:#141413,color:#fff
style M1 fill:#6a9bcc,stroke:#141413,color:#fff
style P0 fill:#6a9bcc,stroke:#141413,color:#fff
style P1 fill:#6a9bcc,stroke:#141413,color:#fff
style ATE fill:#6a9bcc,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The RA estimator.&lt;/strong> Formally, the ATE under regression adjustment is:&lt;/p>
&lt;p>$$\hat{\tau}_{RA}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]$$&lt;/p>
&lt;p>where $\hat{\mu}_1(X)$ is the predicted outcome under treatment (fitted from treated observations) and $\hat{\mu}_0(X)$ is the predicted outcome under control (fitted from untreated observations), both evaluated at each household&amp;rsquo;s covariates $X_i$. In plain language: for each household, the model predicts what their consumption would be if they received the cash transfer and what it would be if they did not. The difference is the household&amp;rsquo;s estimated treatment effect. Averaging these across all $N$ households gives the ATE.&lt;/p>
&lt;p>For the ATT, we restrict the average to treated units only:&lt;/p>
&lt;p>$$\hat{\tau}_{RA}^{ATT} = \frac{1}{N_1} \sum_{i: T_i = 1} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) \right]$$&lt;/p>
&lt;p>where $N_1$ is the number of treated households.&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> Consider Household A: a 40-year-old female in poverty with 10 years of education. The treated outcome model predicts her consumption at 10.17 log points. The untreated outcome model predicts 10.05. Her estimated individual treatment effect is $10.17 - 10.05 = 0.12$. Averaging such predictions over all 2,000 endline households gives the ATE.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> The &lt;code>teffects ra&lt;/code> command fits linear outcome models by default. The first parenthesis specifies the outcome model (outcome variable + covariates), and the second specifies the treatment variable: &lt;code>teffects ra (y c.age c.edu i.female i.poverty) (treat), ate&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong &amp;mdash; model misspecification.&lt;/strong> RA&amp;rsquo;s Achilles heel is that it relies entirely on the outcome model being correctly specified. If consumption depends on age nonlinearly (for example, a U-shaped relationship), but we assume a linear model, the predictions $\hat{\mu}_1$ and $\hat{\mu}_0$ will be systematically wrong, biasing the ATE. As the Stata manual notes, RA works well when the outcome model is correct, but &amp;ldquo;relying on a correctly specified outcome model with little data is extremely risky.&amp;rdquo; RA gives the right answer &lt;strong>only if the outcome model is correct&lt;/strong>. If it is wrong, the ATE estimate can be biased even with infinite data.&lt;/p>
&lt;p>What if we are unsure about the functional form of the outcome model? Is there an approach that avoids modeling the outcome entirely?&lt;/p>
&lt;h3 id="72-inverse-probability-weighting-ipw-----modeling-the-treatment-assignment">7.2 Inverse Probability Weighting (IPW) &amp;mdash; modeling the treatment assignment&lt;/h3>
&lt;p>IPW takes the opposite approach. Instead of modeling consumption, it models the probability of being assigned to treatment &amp;mdash; the &lt;strong>propensity score&lt;/strong>, defined as $p(X) = \Pr(T = 1 \mid X)$. It then reweights observations so that the treatment and control groups become comparable. The Stata documentation explains: &lt;em>&amp;ldquo;IPW estimators use weighted averages of the observed outcome variable to estimate means of the potential outcomes. The weights account for the missing data inherent in the potential-outcome framework.&amp;quot;&lt;/em>&lt;/p>
&lt;p>The logic is elegant: in a perfectly randomized experiment, every household has the same 50% chance of treatment, and a simple comparison of means is unbiased. When chance imbalances arise (like our 9.3% gender SMD), the estimated propensity scores deviate slightly from 0.50. IPW corrects for these imbalances by making the reweighted sample look as if randomization had been perfect &amp;mdash; without ever modeling the outcome.&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; opinion polling.&lt;/strong> Election pollsters know their survey overrepresents some demographics. If 60% of respondents are college graduates but only 35% of voters are, pollsters give lower weight to each college graduate&amp;rsquo;s response and higher weight to non-graduates. IPW does the same thing for treatment groups &amp;mdash; it reweights households so the treated and control groups have the same covariate distribution.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Treatment and control groups&amp;lt;br/&amp;gt;may have imbalances&amp;quot;]
PS[&amp;quot;&amp;lt;b&amp;gt;Estimate propensity score&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;p(X) = Pr(T=1 | X)&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;via logistic regression&amp;lt;/i&amp;gt;&amp;quot;]
WT[&amp;quot;&amp;lt;b&amp;gt;Compute weights&amp;lt;/b&amp;gt;&amp;quot;]
WTR[&amp;quot;Treated: weight = 1/p(X)&amp;quot;]
WCT[&amp;quot;Control: weight = 1/(1−p(X))&amp;quot;]
ATE[&amp;quot;&amp;lt;b&amp;gt;ATE&amp;lt;/b&amp;gt; = Weighted mean(treated)&amp;lt;br/&amp;gt;− Weighted mean(control)&amp;quot;]
DATA --&amp;gt; PS
PS --&amp;gt; WT
WT --&amp;gt; WTR
WT --&amp;gt; WCT
WTR --&amp;gt; ATE
WCT --&amp;gt; ATE
style DATA fill:#141413,stroke:#d97757,color:#fff
style PS fill:#d97757,stroke:#141413,color:#fff
style WT fill:#d97757,stroke:#141413,color:#fff
style WTR fill:#d97757,stroke:#141413,color:#fff
style WCT fill:#d97757,stroke:#141413,color:#fff
style ATE fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The propensity score.&lt;/strong> The propensity score is estimated via logistic regression:&lt;/p>
&lt;p>$$\hat{p}(X_i) = \Pr(T_i = 1 \mid X_i) = \text{logit}^{-1}(\hat{\alpha} + \hat{\beta}' X_i)$$&lt;/p>
&lt;p>In plain language: we fit a logistic model predicting whether each household was assigned to treatment, based on their covariates (age, education, gender, poverty status). The predicted probability is their propensity score.&lt;/p>
&lt;p>&lt;strong>The IPW estimator.&lt;/strong> The ATE under IPW is:&lt;/p>
&lt;p>$$\hat{\tau}_{IPW}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \frac{T_i \cdot Y_i}{\hat{p}(X_i)} - \frac{(1 - T_i) \cdot Y_i}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>Each treated household&amp;rsquo;s outcome is divided by its probability of being treated &amp;mdash; this upweights treated households that &amp;ldquo;look like&amp;rdquo; control households (the Stata manual calls this placing &amp;ldquo;a larger weight on those observations for which $y_{1i}$ is observed even though its observation was not likely&amp;rdquo;). Each control household&amp;rsquo;s outcome is divided by its probability of being in the control group. The reweighting creates a pseudo-population where treatment assignment is independent of covariates.&lt;/p>
&lt;p>For the ATT, only the control group needs reweighting (because the treated group is already the reference population):&lt;/p>
&lt;p>$$\hat{\tau}_{IPW}^{ATT} = \frac{1}{N_1} \sum_{i=1}^{N} \left[ T_i \cdot Y_i - \frac{(1 - T_i) \cdot \hat{p}(X_i) \cdot Y_i}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> In our RCT, a female household in poverty might have $\hat{p}(X) = 0.52$ (slightly more likely to be treated due to the gender imbalance). If treated, her weight is $1/0.52 = 1.92$. If in the control group, her weight is $1/(1 - 0.52) = 2.08$. A male non-poor household might have $\hat{p}(X) = 0.49$, giving weights close to 2.0 in either group. These mild adjustments rebalance the groups to remove the chance gender imbalance.&lt;/p>
&lt;p>&lt;strong>Why IPW matters even in RCTs.&lt;/strong> In a perfect RCT, the true propensity score is exactly 0.50 for everyone, and IPW does nothing. But finite samples produce chance imbalances. IPW uses the estimated propensity scores (which deviate slightly from 0.50) to correct for these imbalances without making any assumptions about how covariates affect the outcome.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> The &lt;code>teffects ipw&lt;/code> command fits a logistic treatment model by default. Note that the first parenthesis specifies only the outcome variable (no covariates &amp;mdash; IPW does not model the outcome), and the second specifies the treatment model: &lt;code>teffects ipw (y) (treat c.age c.edu i.female i.poverty), ate&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong &amp;mdash; extreme weights.&lt;/strong> IPW&amp;rsquo;s vulnerability is extreme propensity scores. If $\hat{p}(X) = 0.01$ for some household, the weight becomes $1/0.01 = 100$ &amp;mdash; that single household dominates the ATE estimate, causing high variance and instability. The Stata manual warns: &lt;em>&amp;ldquo;When propensity scores are extreme (near 0 or 1), the inverse weights become very large, producing unstable estimates.&amp;quot;&lt;/em> This happens when the treatment and control groups have poor &lt;strong>overlap&lt;/strong> &amp;mdash; some covariate combinations appear only in one group. In our well-designed RCT, all propensity scores are between 0.43 and 0.55 (we verified this in Section 5.4), so extreme weights are not a concern.&lt;/p>
&lt;p>RA works well if the outcome model is correct but can be biased if it is wrong. IPW works well if the propensity score model is correct but can be unstable if it is wrong. Is there a method that protects us against both types of misspecification?&lt;/p>
&lt;h3 id="73-doubly-robust-dr-----modeling-both">7.3 Doubly Robust (DR) &amp;mdash; modeling both&lt;/h3>
&lt;p>Doubly robust methods combine RA and IPW into a single estimator. They fit an outcome model &lt;strong>and&lt;/strong> estimate a propensity score. The key property &amp;mdash; the reason they are called &amp;ldquo;doubly robust&amp;rdquo; &amp;mdash; is that the estimator is consistent (converges to the true treatment effect with enough data) if &lt;strong>either&lt;/strong> the outcome model &lt;strong>or&lt;/strong> the propensity score model is correctly specified. You do not need both to be right &amp;mdash; just one.&lt;/p>
&lt;p>The Stata manual describes this property: &lt;em>&amp;ldquo;AIPW estimators model both the outcome and the treatment probability. A surprising fact is that only one of the two models must be correctly specified to consistently estimate the treatment effects.&amp;quot;&lt;/em>&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; backup power.&lt;/strong> Think of a house with two independent power sources: the electrical grid (the outcome model) and a solar panel system (the propensity score model). If the grid goes down (outcome model is misspecified), solar power keeps the lights on. If clouds block the solar panels (propensity score model is wrong), the grid still works. As long as at least one power source is functioning, the house stays lit. That is doubly robust estimation &amp;mdash; as long as at least one model is correct, the estimator gives the right answer.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph TD
DATA[&amp;quot;&amp;lt;b&amp;gt;Observed Data&amp;lt;/b&amp;gt;&amp;quot;]
RA_C[&amp;quot;&amp;lt;b&amp;gt;RA component&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Predict Ŷ₁ and Ŷ₀&amp;lt;br/&amp;gt;for each household&amp;quot;]
IPW_C[&amp;quot;&amp;lt;b&amp;gt;IPW component&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Estimate propensity&amp;lt;br/&amp;gt;score p(X)&amp;quot;]
RESID[&amp;quot;&amp;lt;b&amp;gt;Prediction errors&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Y − Ŷ for each&amp;lt;br/&amp;gt;household&amp;quot;]
CORRECT[&amp;quot;&amp;lt;b&amp;gt;Bias-correction term&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;IPW-weighted residuals&amp;quot;]
DR[&amp;quot;&amp;lt;b&amp;gt;DR estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;= RA prediction&amp;lt;br/&amp;gt;+ Bias correction&amp;quot;]
DATA --&amp;gt; RA_C
DATA --&amp;gt; IPW_C
RA_C --&amp;gt; RESID
IPW_C --&amp;gt; CORRECT
RESID --&amp;gt; CORRECT
RA_C --&amp;gt; DR
CORRECT --&amp;gt; DR
style DATA fill:#141413,stroke:#00d4c8,color:#fff
style RA_C fill:#6a9bcc,stroke:#141413,color:#fff
style IPW_C fill:#d97757,stroke:#141413,color:#fff
style RESID fill:#6a9bcc,stroke:#141413,color:#fff
style CORRECT fill:#d97757,stroke:#141413,color:#fff
style DR fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>&lt;strong>The AIPW estimator.&lt;/strong> The most common doubly robust form is Augmented Inverse Probability Weighting (AIPW):&lt;/p>
&lt;p>$$\hat{\tau}_{DR}^{ATE} = \frac{1}{N} \sum_{i=1}^{N} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) + \frac{T_i (Y_i - \hat{\mu}_1(X_i))}{\hat{p}(X_i)} - \frac{(1 - T_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{p}(X_i)} \right]$$&lt;/p>
&lt;p>This equation has two clearly interpretable components:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>RA component&lt;/strong> (first two terms): $\hat{\mu}_1(X_i) - \hat{\mu}_0(X_i)$ &amp;mdash; the regression adjustment prediction, exactly as in Section 7.1&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bias-correction component&lt;/strong> (last two terms): IPW-weighted residuals $(Y_i - \hat{\mu})$ &amp;mdash; the difference between actual and predicted outcomes, weighted by inverse propensity scores&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>In plain language: start with the RA prediction of each household&amp;rsquo;s treatment effect. Then ask: how far off was that prediction from reality? Weight those prediction errors by the propensity score. If RA was already right, the errors average to zero and you just get RA. If RA was wrong but IPW is right, the weighted errors exactly cancel the RA bias.&lt;/p>
&lt;p>&lt;strong>Why the magic works &amp;mdash; four scenarios.&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Outcome model correct, propensity model wrong:&lt;/strong> The residuals $(Y_i - \hat{\mu})$ are zero on average, so the correction terms vanish. DR reduces to RA. Correct answer.&lt;/li>
&lt;li>&lt;strong>Propensity model correct, outcome model wrong:&lt;/strong> The IPW reweighting is valid, so the correction terms fix the RA bias. Correct answer.&lt;/li>
&lt;li>&lt;strong>Both models correct:&lt;/strong> Both components work together, producing the most efficient estimate.&lt;/li>
&lt;li>&lt;strong>Both models wrong:&lt;/strong> Neither safety net catches the error. The estimate can be biased. DR provides insurance, not invincibility.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>AIPW vs. IPWRA in Stata.&lt;/strong> Stata offers two doubly robust commands. &lt;code>teffects aipw&lt;/code> augments the IPW estimator with an outcome-model correction (the equation above). &lt;code>teffects ipwra&lt;/code> applies propensity score weights to the regression adjustment &amp;mdash; arriving at the same property from the other direction. Both are doubly robust and produce nearly identical results in practice.&lt;/p>
&lt;p>&lt;strong>Stata implementation.&lt;/strong> Both commands require specifying the outcome model in the first parenthesis and the treatment model in the second: &lt;code>teffects ipwra (y c.age c.edu i.female i.poverty) (treat c.age c.edu i.female i.poverty), vce(robust)&lt;/code>.&lt;/p>
&lt;p>&lt;strong>What can go wrong.&lt;/strong> DR fails only when &lt;strong>both&lt;/strong> models are wrong. This is much less likely than either single model being wrong &amp;mdash; getting at least one model approximately right is much easier than getting both perfectly right. However, the Stata manual notes: &lt;em>&amp;ldquo;When both the outcome and the treatment model are misspecified, which estimator is more robust is a matter of debate.&amp;quot;&lt;/em> Using flexible specifications (polynomials, interactions) reduces the risk of both models failing simultaneously.&lt;/p>
&lt;h3 id="comparison-of-the-three-approaches">Comparison of the three approaches&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Feature&lt;/th>
&lt;th>RA&lt;/th>
&lt;th>IPW&lt;/th>
&lt;th>DR (AIPW/IPWRA)&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Models the outcome?&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Models the treatment?&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Key equation&lt;/td>
&lt;td>$\hat{\mu}_1(X) - \hat{\mu}_0(X)$&lt;/td>
&lt;td>$T \cdot Y / \hat{p}(X)$&lt;/td>
&lt;td>RA + IPW-weighted residuals&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Consistent if outcome model correct?&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Consistent if treatment model correct?&lt;/td>
&lt;td>&amp;mdash;&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Main vulnerability&lt;/td>
&lt;td>Outcome misspecification&lt;/td>
&lt;td>Extreme weights&lt;/td>
&lt;td>Both models wrong&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Stata command&lt;/td>
&lt;td>&lt;code>teffects ra&lt;/code>&lt;/td>
&lt;td>&lt;code>teffects ipw&lt;/code>&lt;/td>
&lt;td>&lt;code>teffects ipwra&lt;/code> / &lt;code>teffects aipw&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;pre>&lt;code class="language-mermaid">graph LR
RA[&amp;quot;&amp;lt;b&amp;gt;Regression Adjustment&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models the outcome&amp;quot;]
IPW[&amp;quot;&amp;lt;b&amp;gt;Inverse Probability&amp;lt;br/&amp;gt;Weighting&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models the treatment&amp;quot;]
DR[&amp;quot;&amp;lt;b&amp;gt;Doubly Robust&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Models both&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Consistent if either&amp;lt;br/&amp;gt;model is correct&amp;lt;/i&amp;gt;&amp;quot;]
RA --&amp;gt; DR
IPW --&amp;gt; DR
style RA fill:#6a9bcc,stroke:#141413,color:#fff
style IPW fill:#d97757,stroke:#141413,color:#fff
style DR fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;p>The doubly robust estimator combines the strengths of both RA and IPW. It is the &lt;strong>standard recommendation in modern causal inference&lt;/strong> because it provides an extra layer of protection against model misspecification. Now that we understand what each method does, what it assumes, and what can go wrong, let us apply all three to our cash transfer data and compare their results.&lt;/p>
&lt;hr>
&lt;h2 id="8-cross-sectional-estimation-at-endline-----ra-ipw-and-dr">8. Cross-sectional estimation at endline &amp;mdash; RA, IPW, and DR&lt;/h2>
&lt;p>We now estimate treatment effects using only endline data. For each method, we compute both the &lt;strong>ATE&lt;/strong> (the policymaker&amp;rsquo;s quantity) and the &lt;strong>ATT&lt;/strong> (the evaluator&amp;rsquo;s quantity).&lt;/p>
&lt;h3 id="81-simple-difference-in-means">8.1 Simple difference in means&lt;/h3>
&lt;p>The simplest approach is to compare mean outcomes between treated and control groups at endline.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
reg y treat, robust
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Linear regression Number of obs = 2,000
F(1, 1998) = 35.43
Prob &amp;gt; F = 0.0000
R-squared = 0.0174
Root MSE = .43449
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. t P&amp;gt;|t| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
treat | .1157465 .0194443 5.95 0.000 .0776132 .1538798
_cons | 10.05374 .014001 718.07 0.000 10.02628 10.0812
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The simple difference in means yields an estimate of 0.116 (SE = 0.019, p &amp;lt; 0.001, 95% CI [0.078, 0.154]). Because the outcome is in logs, this means being offered the cash transfer increased household consumption by approximately 11.6%. This estimate is close to the true effect of 12% and is our benchmark for comparison. However, it does not adjust for the gender imbalance we discovered at baseline.&lt;/p>
&lt;h3 id="82-regression-adjustment-----ate-and-att">8.2 Regression Adjustment &amp;mdash; ATE and ATT&lt;/h3>
&lt;p>Regression adjustment models the outcome as a function of treatment and covariates, then computes predicted outcomes under treatment and control for each observation.&lt;/p>
&lt;pre>&lt;code class="language-stata">* RA: Average Treatment Effect (ATE)
teffects ra (y c.age c.edu i.female i.poverty) (treat), ate
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : regression adjustment
Outcome model : linear
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1125431 .0190927 5.89 0.000 .0751221 .1499641
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05503 .0138703 724.93 0.000 10.02785 10.08222
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* RA: Average Treatment Effect on the Treated (ATT)
teffects ra (y c.age c.edu i.female i.poverty) (treat), atet
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : regression adjustment
Outcome model : linear
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1132537 .0191498 5.91 0.000 .0757208 .1507865
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05623 .0140082 717.88 0.000 10.02878 10.08369
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The RA estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). The ATE and ATT are nearly identical, which confirms that treatment effects are approximately &lt;strong>homogeneous&lt;/strong> across households. The RA approach models the outcome with covariates (age, education, gender, poverty), which adjusts for the baseline gender imbalance and can improve precision.&lt;/p>
&lt;h3 id="83-inverse-probability-weighting-----ate-and-att">8.3 Inverse Probability Weighting &amp;mdash; ATE and ATT&lt;/h3>
&lt;p>IPW reweights observations based on their estimated probability of treatment, without modeling the outcome.&lt;/p>
&lt;pre>&lt;code class="language-stata">* IPW: Average Treatment Effect (ATE)
teffects ipw (y) (treat c.age c.edu i.female i.poverty), ate
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : inverse-probability weights
Outcome model : weighted mean
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1126713 .0190886 5.90 0.000 .0752583 .1500844
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05495 .0138651 725.20 0.000 10.02778 10.08213
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* IPW: Average Treatment Effect on the Treated (ATT)
teffects ipw (y) (treat c.age c.edu i.female i.poverty), atet
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : inverse-probability weights
Outcome model : weighted mean
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1134031 .0191397 5.93 0.000 .0758899 .1509162
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05608 .0140004 718.27 0.000 10.02864 10.08352
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The IPW estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). These are very close to the RA results, which is expected in a well-designed RCT where propensity scores are near 0.50 for all households. Notice that IPW does &lt;strong>not&lt;/strong> model the outcome &amp;mdash; it only models the treatment assignment process using the propensity score. The close agreement between RA and IPW gives us confidence that both the outcome model and the treatment model are approximately correct.&lt;/p>
&lt;h3 id="84-doubly-robust-----ate-and-att-ipwra">8.4 Doubly Robust &amp;mdash; ATE and ATT (IPWRA)&lt;/h3>
&lt;p>The doubly robust IPWRA estimator combines outcome modeling and propensity score weighting.&lt;/p>
&lt;pre>&lt;code class="language-stata">* IPWRA: Average Treatment Effect (ATE)
teffects ipwra (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty), vce(robust)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .112639 .0190901 5.90 0.000 .0752231 .1500549
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.055 .0138677 725.07 0.000 10.02782 10.08218
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-stata">* IPWRA: Average Treatment Effect on the Treated (ATT)
teffects ipwra (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty), atet vce(robust)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat |
(1 vs 0) | .1133162 .0191469 5.92 0.000 .0757889 .1508435
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.05617 .0140019 718.20 0.000 10.02873 10.08361
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The doubly robust IPWRA estimates are ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) and ATT = 0.113 (SE = 0.019, 95% CI [0.076, 0.151]). These are very close to the RA and IPW estimates, confirming that all three approaches converge in this well-designed RCT. The DR method provides the most reliable cross-sectional estimate because it is protected against misspecification of either the outcome or treatment model.&lt;/p>
&lt;h3 id="85-doubly-robust-----aipw-alternative">8.5 Doubly Robust &amp;mdash; AIPW alternative&lt;/h3>
&lt;p>As a robustness check, we can also compute the doubly robust estimate using the AIPW formulation instead of IPWRA.&lt;/p>
&lt;pre>&lt;code class="language-stata">* AIPW: Average Treatment Effect (ATE)
teffects aipw (y c.age c.edu i.female i.poverty) ///
(treat c.age c.edu i.female i.poverty)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : augmented IPW
Outcome model : linear by ML
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
treat |
(1 vs 0) | .1126412 .0190903 5.90 0.000 .075225 .1500574
─────────────+────────────────────────────────────────────────────────────────
POmean |
treat |
0 | 10.055 .013868 725.05 0.000 10.02782 10.08218
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The AIPW estimate of ATE = 0.113 (SE = 0.019, 95% CI [0.075, 0.150]) is virtually identical to the IPWRA result (0.113). Both are doubly robust &amp;mdash; the difference lies in the computational approach (AIPW augments the IPW estimator with a bias-correction term, while IPWRA applies IPW weights to the regression adjustment), but the theoretical properties and estimates are the same.&lt;/p>
&lt;h3 id="86-cross-sectional-comparison">8.6 Cross-sectional comparison&lt;/h3>
&lt;p>The table below summarizes all cross-sectional estimates.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATT&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Several patterns emerge from this comparison. First, &lt;strong>ATE and ATT are nearly identical&lt;/strong> for every method, confirming that treatment effects are homogeneous across households. Second, &lt;strong>RA, IPW, and DR all give remarkably similar results&lt;/strong> (all approximately 0.113) because, in this well-designed RCT, randomization ensures that both the outcome model and the propensity score model are approximately correct. Third, the simple difference in means (0.116) is slightly higher than the covariate-adjusted estimates (0.113), reflecting the precision improvement from controlling for covariates including the gender imbalance. Finally, all confidence intervals contain the true effect of 0.12 &amp;mdash; every method successfully recovers the correct answer.&lt;/p>
&lt;p>The real value of doubly robust methods becomes apparent in less ideal settings. When one model might be misspecified &amp;mdash; a common situation in practice &amp;mdash; DR methods provide insurance that RA or IPW alone cannot offer.&lt;/p>
&lt;hr>
&lt;h2 id="9-leveraging-panel-data-----difference-in-differences">9. Leveraging panel data &amp;mdash; Difference-in-Differences&lt;/h2>
&lt;p>All estimates in Section 8 used only endline data. But we have panel data &amp;mdash; the same 2,000 households observed before and after the intervention. Can we do better?&lt;/p>
&lt;h3 id="91-why-use-panel-data">9.1 Why use panel data?&lt;/h3>
&lt;p>Cross-sectional methods (RA, IPW, DR) compare treated and control groups at a single point in time &amp;mdash; the endline. They control for &lt;strong>observable&lt;/strong> covariates like age, education, and gender. But there may be &lt;strong>unobservable&lt;/strong> characteristics &amp;mdash; household motivation, geographic advantages, cultural factors &amp;mdash; that differ between groups and affect consumption. No amount of cross-sectional covariate adjustment can control for these, because we simply do not observe them.&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; comparing students across schools.&lt;/strong> Imagine comparing test scores between students at a charter school (treatment) and a traditional school (control). You can adjust for observable differences like family income and prior grades. But what about unmeasured factors &amp;mdash; parental involvement, neighborhood quality, student ambition? A cross-sectional comparison cannot disentangle the school effect from these hidden differences. Now suppose you observe the &lt;em>same students&lt;/em> before and after they switch schools. By comparing each student&amp;rsquo;s score change, you automatically cancel out all fixed student characteristics &amp;mdash; because they are the same at both time points. That is the power of panel data.&lt;/p>
&lt;p>Panel data methods like difference-in-differences (DiD) solve this problem by comparing each household &lt;strong>to itself&lt;/strong> over time. By looking at how each household&amp;rsquo;s consumption changed from baseline to endline, we effectively control for all &lt;strong>time-invariant unobservable characteristics&lt;/strong> (household fixed effects). This is a powerful advantage that cross-sectional methods cannot replicate.&lt;/p>
&lt;h4 id="the-did-estimator">The DiD estimator&lt;/h4>
&lt;p>The DiD estimator computes a simple but powerful quantity &amp;mdash; a &amp;ldquo;difference of differences&amp;rdquo;:&lt;/p>
&lt;p>$$\hat{\tau}_{DiD} = \underbrace{(\bar{Y}_{treat,post} - \bar{Y}_{treat,pre})}_{\text{Change for treated}} - \underbrace{(\bar{Y}_{control,post} - \bar{Y}_{control,pre})}_{\text{Change for control}}$$&lt;/p>
&lt;p>The first difference ($\bar{Y}_{treat,post} - \bar{Y}_{treat,pre}$) captures the treatment group&amp;rsquo;s change over time &amp;mdash; the treatment effect &lt;strong>plus&lt;/strong> any common time trend (e.g., economic growth that affects all households). The second difference ($\bar{Y}_{control,post} - \bar{Y}_{control,pre}$) captures the control group&amp;rsquo;s change &amp;mdash; the common time trend &lt;strong>only&lt;/strong>, since they did not receive treatment. Subtracting the second from the first removes the time trend, isolating the treatment effect.&lt;/p>
&lt;p>&lt;strong>Mini example from our data.&lt;/strong> Suppose the treated group&amp;rsquo;s average log consumption went from 10.01 at baseline to 10.17 at endline (change = +0.16). The control group went from 10.03 to 10.06 (change = +0.03). The DiD estimate is $0.16 - 0.03 = 0.13$ &amp;mdash; close to the true effect of 0.12. The control group&amp;rsquo;s +0.03 change captures the natural time trend that would have affected everyone, and subtracting it isolates the treatment effect.&lt;/p>
&lt;h4 id="the-parallel-trends-assumption">The parallel trends assumption&lt;/h4>
&lt;p>The key identifying assumption of DiD is the &lt;strong>parallel trends assumption (PTA)&lt;/strong>: absent the treatment, the treatment and control groups would have followed the same time trend. Formally:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Notation note&lt;/strong> &amp;mdash; In the DiD literature and in the Sant&amp;rsquo;Anna and Zhao (2020) paper, $D$ denotes treatment group assignment (equivalent to our &lt;code>treat&lt;/code> variable). This differs from our data dictionary where &lt;code>D&lt;/code> is the receipt indicator. In this section and Section 9.4, we follow the paper&amp;rsquo;s convention: $D = 1$ means assigned to treatment, $D = 0$ means assigned to control.&lt;/p>
&lt;/blockquote>
&lt;p>$$E[Y_1(0) - Y_0(0) \mid D = 1] = E[Y_1(0) - Y_0(0) \mid D = 0]$$&lt;/p>
&lt;p>This says that the average change in &lt;em>untreated&lt;/em> potential outcomes is the same for the treated and control groups. Note that this does &lt;strong>not&lt;/strong> require the two groups to have the same &lt;em>level&lt;/em> of consumption &amp;mdash; only the same &lt;em>trend&lt;/em>. The treated group can start higher or lower, as long as their consumption would have evolved at the same rate as the control group in the absence of the program.&lt;/p>
&lt;p>In an RCT, the parallel trends assumption is very plausible because randomization ensures the groups were similar at baseline. Any pre-existing differences between groups occurred by chance and are unlikely to produce different time trends. This makes DiD a strong estimator in our setting.&lt;/p>
&lt;pre>&lt;code class="language-mermaid">graph LR
subgraph &amp;quot;Parallel Trends Assumption&amp;quot;
PRE[&amp;quot;&amp;lt;b&amp;gt;Baseline 2021&amp;lt;/b&amp;gt;&amp;quot;]
POST[&amp;quot;&amp;lt;b&amp;gt;Endline 2024&amp;lt;/b&amp;gt;&amp;quot;]
end
PRE --&amp;gt;|&amp;quot;Treated group&amp;lt;br/&amp;gt;change = effect + trend&amp;quot;| POST
PRE --&amp;gt;|&amp;quot;Control group&amp;lt;br/&amp;gt;change = trend only&amp;quot;| POST
style PRE fill:#6a9bcc,stroke:#141413,color:#fff
style POST fill:#d97757,stroke:#141413,color:#fff
&lt;/code>&lt;/pre>
&lt;h3 id="92-why-does-did-estimate-att-and-not-ate">9.2 Why does DiD estimate ATT and not ATE?&lt;/h3>
&lt;p>This is a point that many beginners miss, so it is worth explaining carefully.&lt;/p>
&lt;p>Recall from Section 6 that the ATT is $E[Y_1(1) - Y_1(0) \mid D = 1]$ &amp;mdash; the effect on those who were treated. Sant&amp;rsquo;Anna and Zhao (2020) make this explicit: the main challenge is computing $E[Y_1(0) \mid D = 1]$ &amp;mdash; what would the treated group&amp;rsquo;s consumption have been at endline &lt;em>without&lt;/em> the program?&lt;/p>
&lt;p>DiD solves this by using the control group&amp;rsquo;s time trend as a stand-in. Specifically, it constructs the counterfactual for the treated group as:&lt;/p>
&lt;p>$$\underbrace{E[Y_1(0) \mid D = 1]}_{\text{Counterfactual}} = \underbrace{E[Y_0 \mid D = 1]}_{\text{Treated at baseline}} + \underbrace{(E[Y_1 \mid D = 0] - E[Y_0 \mid D = 0])}_{\text{Control group&amp;rsquo;s time trend}}$$&lt;/p>
&lt;p>This counterfactual is &lt;strong>specific to the treated group&lt;/strong> &amp;mdash; it starts from their baseline level and adds the control group&amp;rsquo;s trend. DiD therefore estimates what happened to the treated group relative to this counterfactual. This is precisely the ATT.&lt;/p>
&lt;p>&lt;strong>Why not the ATE?&lt;/strong> To estimate the ATE, we would also need the treatment effect for the untreated &amp;mdash; what would happen if we gave the program to those who did not receive it. DiD does not provide this, because the counterfactual it constructs runs in only one direction (control trend applied to treated baseline, not treated trend applied to control baseline).&lt;/p>
&lt;p>&lt;strong>In our RCT context&lt;/strong>, since treatment was randomly assigned, ATE and ATT are likely very similar (as we saw in Section 8). But in observational studies with heterogeneous treatment effects, this distinction matters greatly. A job-training program might have a larger effect on those who voluntarily enrolled (ATT) than it would have on randomly selected workers (ATE).&lt;/p>
&lt;h3 id="93-basic-did-with-panel-fixed-effects">9.3 Basic DiD with panel fixed effects&lt;/h3>
&lt;p>We now implement the basic DiD estimator using Stata&amp;rsquo;s &lt;code>xtdidregress&lt;/code> command, which handles the panel structure and computes clustered standard errors.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
* Create the treatment-post interaction
gen treat_post = treat * post
label var treat_post &amp;quot;Treated x Post (1 only for treated in 2024)&amp;quot;
* Declare panel structure
xtset id year
* Basic DiD with individual fixed effects
xtdidregress (y) (treat_post), group(id) time(year) vce(cluster id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> Number of obs = 4,000
Number of groups = 2,000
Outcome model : linear
Treatment model: none
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. t P&amp;gt;|t| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET |
treat_post | .1347161 .0272737 4.94 0.000 .0812282 .188204
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The basic DiD estimate of the ATT is 0.135 (SE = 0.027, p &amp;lt; 0.001, 95% CI [0.081, 0.188]). This is slightly higher than the cross-sectional estimates (0.113&amp;ndash;0.116) but still contains the true effect of 0.12 within its confidence interval. The wider standard error (0.027 vs. 0.019) reflects the additional variability introduced by differencing within households. Standard errors are clustered at the household level to account for serial correlation within panels.&lt;/p>
&lt;p>The key advantage of this DiD estimate is that it controls for all &lt;strong>time-invariant unobservable characteristics&lt;/strong> of each household. In an RCT, randomization already handles confounding, so the cross-sectional and panel estimates are similar. But in observational settings, DiD&amp;rsquo;s ability to absorb household fixed effects can correct biases that cross-sectional methods cannot.&lt;/p>
&lt;h3 id="94-from-cross-sectional-dr-to-panel-dr-----doubly-robust-did-drdid">9.4 From cross-sectional DR to panel DR &amp;mdash; Doubly Robust DiD (DRDID)&lt;/h3>
&lt;p>In Section 7, we saw that doubly robust methods combine outcome modeling and propensity score modeling for cross-sectional data. &lt;strong>DRDID extends this logic to the panel setting.&lt;/strong> It combines the DiD framework (using pre/post variation) with doubly robust covariate adjustment.&lt;/p>
&lt;p>This approach was introduced by Sant&amp;rsquo;Anna and Zhao (2020) in a landmark paper published in the &lt;em>Journal of Econometrics&lt;/em>. They proposed estimators that are &amp;ldquo;consistent if either (but not necessarily both) a propensity score or outcome regression working models are correctly specified&amp;rdquo; &amp;mdash; bringing the doubly robust property from the cross-sectional world into the DiD framework.&lt;/p>
&lt;h4 id="why-do-we-need-drdid">Why do we need DRDID?&lt;/h4>
&lt;p>Recall from Section 9.2 that basic DiD relies on the &lt;strong>parallel trends assumption&lt;/strong> &amp;mdash; absent treatment, the treated and control groups would have followed the same time trend. But what if parallel trends holds only &lt;strong>conditional on covariates&lt;/strong>? For example, what if consumption trends differ between poor and non-poor households, but within each poverty group the trends are parallel?&lt;/p>
&lt;p>In this case, we need a &lt;strong>conditional&lt;/strong> parallel trends assumption:&lt;/p>
&lt;p>$$E[Y_1(0) - Y_0(0) \mid D = 1, X] = E[Y_1(0) - Y_0(0) \mid D = 0, X]$$&lt;/p>
&lt;p>This says that the average change in untreated potential outcomes is the same for treated and control groups &lt;em>who share the same covariates&lt;/em> $X$. Note that this allows for covariate-specific time trends (e.g., different consumption growth rates for poor and non-poor households) while still identifying the ATT.&lt;/p>
&lt;p>Under this conditional parallel trends assumption, there are two ways to estimate the ATT:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Outcome regression (OR) approach&lt;/strong> &amp;mdash; model how the outcome evolves over time for the control group, and use that model to predict the counterfactual evolution for the treated group&lt;/li>
&lt;li>&lt;strong>IPW approach&lt;/strong> &amp;mdash; reweight the control group so its covariate distribution matches the treated group, then compute the standard DiD&lt;/li>
&lt;/ul>
&lt;p>The problem is the same as in the cross-sectional case: OR requires a correctly specified outcome model, and IPW requires a correctly specified propensity score model. Sant&amp;rsquo;Anna and Zhao&amp;rsquo;s insight was that &lt;strong>you can combine both into a single estimator that works if either model is correct&lt;/strong>.&lt;/p>
&lt;h4 id="the-drdid-estimator-for-panel-data">The DRDID estimator for panel data&lt;/h4>
&lt;p>When panel data are available (as in our case &amp;mdash; same households observed at baseline and endline), the DRDID estimator takes a particularly clean form. Let $\Delta Y_i = Y_{i,post} - Y_{i,pre}$ denote each household&amp;rsquo;s change in consumption. The DR DID estimator is:&lt;/p>
&lt;p>$$\hat{\tau}_{DR}^{DiD} = \frac{1}{N_1} \sum_{i=1}^{N} \left[ w_1(D_i) - w_0(D_i, X_i) \right] \left[ \Delta Y_i - \hat{\mu}_{0,\Delta}(X_i) \right]$$&lt;/p>
&lt;p>where:&lt;/p>
&lt;ul>
&lt;li>$w_1(D_i) = D_i / \bar{D}$ assigns equal weight to each treated unit (the fraction treated)&lt;/li>
&lt;li>$w_0(D_i, X_i)$ reweights control units using the propensity score $\hat{p}(X)$, so they resemble the treated group&lt;/li>
&lt;li>$\hat{\mu}_{0,\Delta}(X_i) = \hat{\mu}_{0,post}(X_i) - \hat{\mu}_{0,pre}(X_i)$ is the predicted change in consumption for the control group, fitted from control-group data&lt;/li>
&lt;/ul>
&lt;p>In plain language: for each household, compute the change in consumption over time ($\Delta Y$) and subtract the model-predicted change for the control group ($\hat{\mu}_{0,\Delta}$). This residual captures the treatment effect plus any prediction error. Then reweight these residuals using IPW so that the control group matches the treated group&amp;rsquo;s covariate profile.&lt;/p>
&lt;h4 id="why-is-this-doubly-robust">Why is this doubly robust?&lt;/h4>
&lt;p>The doubly robust property works through the same logic as in the cross-sectional case (Section 7.3), but applied to &lt;strong>changes&lt;/strong> rather than levels:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>If the outcome model is correct&lt;/strong> ($\hat{\mu}_{0,\Delta}(X) = E[\Delta Y \mid D=0, X]$), then the residuals $\Delta Y_i - \hat{\mu}_{0,\Delta}(X_i)$ average to zero for the control group, regardless of the propensity score weights. The estimator reduces to an outcome-regression DiD. Correct answer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If the propensity score model is correct&lt;/strong> ($\hat{p}(X) = \Pr(D=1 \mid X)$), the IPW reweighting makes the control group comparable to the treated group, regardless of the outcome model. The correction term fixes any bias from a misspecified outcome model. Correct answer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If both are correct&lt;/strong>, the estimator achieves the &lt;strong>semiparametric efficiency bound&lt;/strong> &amp;mdash; it is the most precise estimator possible given the assumptions. Sant&amp;rsquo;Anna and Zhao proved this formally.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>If both are wrong&lt;/strong>, the estimator can be biased &amp;mdash; double robustness provides one layer of insurance, not two.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">graph TD
DY[&amp;quot;&amp;lt;b&amp;gt;Panel data&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ΔY = Y_post − Y_pre&amp;lt;br/&amp;gt;for each household&amp;quot;]
OR[&amp;quot;&amp;lt;b&amp;gt;Outcome model&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Predict control group's&amp;lt;br/&amp;gt;consumption change&amp;lt;br/&amp;gt;μ̂₀,Δ(X)&amp;quot;]
PS[&amp;quot;&amp;lt;b&amp;gt;Propensity score&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Estimate p(X)&amp;lt;br/&amp;gt;= Pr(D=1 | X)&amp;quot;]
RES[&amp;quot;&amp;lt;b&amp;gt;Residuals&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ΔY − μ̂₀,Δ(X)&amp;quot;]
IPW_W[&amp;quot;&amp;lt;b&amp;gt;IPW reweighting&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Make controls look&amp;lt;br/&amp;gt;like treated group&amp;quot;]
DRDID[&amp;quot;&amp;lt;b&amp;gt;DR-DiD estimate&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;ATT = weighted average&amp;lt;br/&amp;gt;of residuals&amp;quot;]
DY --&amp;gt; RES
OR --&amp;gt; RES
PS --&amp;gt; IPW_W
RES --&amp;gt; DRDID
IPW_W --&amp;gt; DRDID
style DY fill:#141413,stroke:#00d4c8,color:#fff
style OR fill:#6a9bcc,stroke:#141413,color:#fff
style PS fill:#d97757,stroke:#141413,color:#fff
style RES fill:#6a9bcc,stroke:#141413,color:#fff
style IPW_W fill:#d97757,stroke:#141413,color:#fff
style DRDID fill:#00d4c8,stroke:#141413,color:#141413
&lt;/code>&lt;/pre>
&lt;h4 id="what-drdid-adds-over-basic-did-and-twfe">What DRDID adds over basic DiD and TWFE&lt;/h4>
&lt;p>Sant&amp;rsquo;Anna and Zhao (2020) also showed that the standard two-way fixed effects (TWFE) estimator &amp;mdash; the workhorse of applied economics &amp;mdash; can produce misleading results when treatment effects are heterogeneous across covariates. Specifically, the TWFE estimator implicitly assumes (i) that treatment effects are the same for all covariate values, and (ii) that there are no covariate-specific time trends. When these assumptions fail, &amp;ldquo;the estimand is, in general, different from the ATT, and policy evaluation based on it may be misleading.&amp;rdquo; DRDID avoids both of these pitfalls by allowing for flexible outcome models and covariate-specific trends.&lt;/p>
&lt;h4 id="stata-implementation">Stata implementation&lt;/h4>
&lt;p>The &lt;code>drdid&lt;/code> package (Rios-Avila, Sant&amp;rsquo;Anna, and Callaway) implements the estimators from the paper.&lt;/p>
&lt;pre>&lt;code class="language-stata">* Install the drdid package (only needed once)
ssc install drdid, replace
* Doubly Robust DiD with DRIPW estimator
drdid y c.age c.edu i.female i.poverty, ivar(id) time(year) treatment(treat) dripw
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Doubly robust difference-in-differences estimator
Outcome model : least squares
Treatment model: inverse probability
──────────────────────────────────────────────────────────────────────────────
| Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATET | .1374784 .027387 5.02 0.000 .0838008 .191156
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The DRDID estimate of the ATT is 0.137 (SE = 0.027, p &amp;lt; 0.001, 95% CI [0.084, 0.191]). The &lt;code>dripw&lt;/code> option specifies the Doubly Robust Inverse Probability Weighting estimator, which uses a linear least squares model for the outcome evolution of the control group and a logistic model for the propensity score. The result is slightly higher than basic DiD (0.135) and close to the true effect of 0.12.&lt;/p>
&lt;p>&lt;strong>Alternative: Stata 17+ built-in command.&lt;/strong> Stata 17 and later versions include a built-in doubly robust DiD estimator that does not require installing external packages.&lt;/p>
&lt;pre>&lt;code class="language-stata">xthdidregress aipw (y c.age c.edu i.female i.poverty) ///
(treat_post c.age c.edu i.female i.poverty), group(id)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Heterogeneous-treatment-effects regression Number of obs = 4,000
Number of panels = 2,000
Estimator: Augmented IPW
Panel variable: id
Treatment level: id
Control group: Never treated
(Std. err. adjusted for 2,000 clusters in id)
──────────────────────────────────────────────────────────────────────────────
| Robust
Cohort | ATET std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
year |
2024 | .1374784 .027387 5.02 0.000 .0838008 .191156
──────────────────────────────────────────────────────────────────────────────
Note: ATET computed using covariates.
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>xthdidregress aipw&lt;/code> command produces the same ATT estimate of 0.137 (SE = 0.027, 95% CI [0.084, 0.191]) as the &lt;code>drdid&lt;/code> package &amp;mdash; confirming that both implement the same doubly robust DiD methodology. The output labels the result as &amp;ldquo;Cohort year 2024&amp;rdquo; because &lt;code>xthdidregress&lt;/code> is designed for settings with staggered treatment adoption across multiple cohorts; in our two-period design, there is only one treatment cohort (households treated in 2024). As the Stata manual explains, &amp;ldquo;AIPW models both treatment and outcome. If at least one of the models is correctly specified, it provides consistent estimates, a property called double robustness.&amp;rdquo;&lt;/p>
&lt;p>The agreement between &lt;code>drdid&lt;/code> (community package) and &lt;code>xthdidregress aipw&lt;/code> (built-in) provides a useful robustness check &amp;mdash; researchers can verify their results using both implementations.&lt;/p>
&lt;h4 id="panel-data-vs-repeated-cross-sections">Panel data vs. repeated cross-sections&lt;/h4>
&lt;p>An important result from Sant&amp;rsquo;Anna and Zhao (2020) is that panel data are &lt;strong>strictly more efficient&lt;/strong> than repeated cross-sections for estimating the ATT under the DiD framework. The intuition is straightforward: with panel data, we observe each household&amp;rsquo;s individual change over time ($\Delta Y_i$), which eliminates household-level variation. With repeated cross-sections, we can only compare group averages at different time points, which introduces additional noise. The efficiency gain is larger when the sample sizes in the pre and post periods are more imbalanced.&lt;/p>
&lt;p>In our study, we have a balanced panel (same 2,000 households at baseline and endline), so we benefit from this efficiency advantage.&lt;/p>
&lt;h3 id="95-cross-sectional-vs-panel-comparison">9.5 Cross-sectional vs. panel comparison&lt;/h3>
&lt;p>The table below compares our best cross-sectional estimates with the panel-based DiD estimates.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th>Data Used&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>RA&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>IPW&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR (IPWRA)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE&lt;/td>
&lt;td>Endline only&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Basic DiD&lt;/td>
&lt;td>Panel FE&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.135&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.081, 0.188]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR-DiD (&lt;code>drdid&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DR-DiD (&lt;code>xthdidregress&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>&lt;strong>ATT&lt;/strong>&lt;/td>
&lt;td>&lt;strong>Both waves&lt;/strong>&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Several important patterns emerge from this comparison. Cross-sectional methods estimate &lt;strong>ATE&lt;/strong> using only endline data, while DiD methods estimate &lt;strong>ATT&lt;/strong> using both survey waves. The two DR-DiD implementations (&lt;code>drdid&lt;/code> and &lt;code>xthdidregress aipw&lt;/code>) produce identical results, confirming methodological consistency. The DiD estimates (0.135&amp;ndash;0.137) are slightly higher than the cross-sectional estimates (0.113), but &lt;strong>all confidence intervals contain the true effect of 0.12&lt;/strong>. DiD&amp;rsquo;s wider standard errors (0.027 vs. 0.019) reflect the additional variability from differencing within households.&lt;/p>
&lt;p>The key value of DiD is &lt;strong>not&lt;/strong> tighter standard errors &amp;mdash; it is &lt;strong>robustness to time-invariant unobservables.&lt;/strong> In observational settings where randomization does not hold, DiD can correct biases that cross-sectional methods cannot address. In this RCT, randomization already handles confounding, so the estimates are similar. DRDID adds doubly robust protection on top of DiD, making it the most robust panel method available.&lt;/p>
&lt;hr>
&lt;h2 id="10-offer-vs-receipt-----endogenous-treatment-advanced">10. Offer vs. receipt &amp;mdash; endogenous treatment (advanced)&lt;/h2>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> This section addresses the advanced topic of imperfect compliance and endogenous treatment. Readers new to causal inference may wish to skip this section on a first reading and return to it later.&lt;/p>
&lt;/blockquote>
&lt;h3 id="101-the-compliance-problem">10.1 The compliance problem&lt;/h3>
&lt;p>All estimates in Sections 8 and 9 measure the effect of &lt;strong>being offered&lt;/strong> the cash transfer (&lt;code>treat&lt;/code>), not the effect of &lt;strong>actually receiving&lt;/strong> it (&lt;code>D&lt;/code>). This is the intent-to-treat (ITT) approach &amp;mdash; it captures the policy-relevant effect of the offer, regardless of whether households complied.&lt;/p>
&lt;p>But what about the effect of actual receipt? This is more complex because compliance is &lt;strong>not random&lt;/strong>. Only 85% of treated households received the transfer, and 5% of control households received it through other channels. The households that chose to take up the program may differ systematically from those that did not &amp;mdash; they may be more motivated, more financially constrained, or better connected. Naively comparing receivers to non-receivers would introduce &lt;strong>selection bias&lt;/strong>.&lt;/p>
&lt;p>The solution is to use the random assignment (&lt;code>treat&lt;/code>) as an &lt;strong>instrumental variable&lt;/strong> for actual receipt (&lt;code>D&lt;/code>). Because &lt;code>treat&lt;/code> was randomly assigned, it is independent of household characteristics and satisfies the requirements for a valid instrument. This allows us to isolate the causal effect of receipt, at least for the subset of households whose receipt was determined by the offer (the &amp;ldquo;compliers&amp;rdquo;).&lt;/p>
&lt;p>&lt;strong>Analogy &amp;mdash; prescriptions and pills.&lt;/strong> Imagine a doctor randomly prescribes a medication to some patients, but not all patients fill their prescription. We cannot simply compare those who took the pill to those who did not, because pill-takers may be more health-conscious. Instead, we use the random prescription (the &amp;ldquo;offer&amp;rdquo;) as a nudge &amp;mdash; it strongly predicts whether you take the pill but does not directly affect your health except through the pill. That is the instrumental variable approach: using the random offer to estimate the causal effect of actual receipt.&lt;/p>
&lt;h3 id="102-endogenous-treatment-regression">10.2 Endogenous treatment regression&lt;/h3>
&lt;p>Stata&amp;rsquo;s &lt;code>etregress&lt;/code> command estimates the effect of an endogenous treatment variable, using the random assignment as an excluded instrument.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
* Endogenous treatment regression
etregress y c.age i.female i.poverty c.edu, ///
treat(D = treat c.age i.female i.poverty c.edu) vce(robust)
* Mark estimation sample
gen byte esample = e(sample)
* ATE of receipt
margins r.D if esample==1
* ATT of receipt
margins, predict(cte) subpop(if D==1 &amp;amp; esample==1)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Linear regression with endogenous treatment Number of obs = 2,000
Estimator: Maximum likelihood Wald chi2(5) = 92.23
Log pseudolikelihood = -1797.6297 Prob &amp;gt; chi2 = 0.0000
──────────────────────────────────────────────────────────────────────────────
| Robust
| Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
y |
age | .003187 .0010016 3.18 0.001 .001224 .0051501
1.female | .0801465 .0189552 4.23 0.000 .042995 .117298
1.poverty | -.1030302 .0205984 -5.00 0.000 -.1434023 -.062658
edu | .0182634 .0045243 4.04 0.000 .0093959 .0271308
1.D | .1471 .0246775 5.96 0.000 .0987329 .1954671
_cons | 9.705642 .0694641 139.72 0.000 9.569495 9.841789
─────────────+────────────────────────────────────────────────────────────────
D |
treat | 2.55806 .0802103 31.89 0.000 2.40085 2.715269
_cons | -1.844408 .2847883 -6.48 0.000 -2.402582 -1.286233
─────────────+────────────────────────────────────────────────────────────────
/athrho | -.0060068 .0481062 -0.12 0.901 -.1002933 .0882796
sigma | .4245195 .0066426 .411698 .4377404
──────────────────────────────────────────────────────────────────────────────
Wald test of indep. eqns. (rho = 0): chi2(1) = 0.02 Prob &amp;gt; chi2 = 0.9006
ATE of receipt (margins r.D):
──────────────────────────────────────────────────────────────────────────────
D | Contrast std. err. [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
(1 vs 0) | .1471 .0246775 .0987329 .1954671
──────────────────────────────────────────────────────────────────────────────
ATT of receipt (margins, predict(cte)):
──────────────────────────────────────────────────────────────────────────────
_cons | Margin std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
| .1471 .0246775 5.96 0.000 .0987329 .1954671
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>etregress&lt;/code> output reveals several important findings. The coefficient on &lt;code>D&lt;/code> (receipt) is 0.147 (SE = 0.025, p &amp;lt; 0.001, 95% CI [0.099, 0.195]), which is the estimated effect of actually receiving the cash transfer. This is larger than the offer-based estimates (0.113&amp;ndash;0.116) because not everyone who was offered the program received it &amp;mdash; the per-recipient effect is naturally larger than the per-offer effect. The Wald test of independent equations (rho = 0) has p = 0.901, indicating no evidence of endogeneity &amp;mdash; consistent with a well-designed RCT where unobservable factors do not drive both treatment receipt and consumption. The &lt;code>margins&lt;/code> commands confirm that both the ATE and ATT of receipt are 0.147 (identical in this case because the model assumes a constant treatment effect).&lt;/p>
&lt;h3 id="103-doubly-robust-estimation-of-receipt-effect">10.3 Doubly robust estimation of receipt effect&lt;/h3>
&lt;p>We can also estimate the receipt effect using a doubly robust approach, incorporating the baseline outcome &lt;code>y0&lt;/code> as an additional control variable (an ANCOVA-style adjustment) and including &lt;code>treat&lt;/code> (the random assignment) as a covariate in the treatment model for &lt;code>D&lt;/code>.&lt;/p>
&lt;pre>&lt;code class="language-stata">use &amp;quot;https://github.com/quarcs-lab/data-open/raw/master/ametrics/dataSIM4RCT.dta&amp;quot;, clear
keep if post==1
* Doubly robust ATE of receipt, controlling for baseline outcome
teffects ipwra (y y0 c.age i.female i.poverty c.edu) ///
(D c.age i.female i.poverty c.edu treat), vce(robust)
* Diagnostic checks
tebalance summarize age edu i.female i.poverty
tebalance summarize, baseline
tebalance density y0
tebalance density age
teffects overlap
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Treatment-effects estimation Number of obs = 2,000
Estimator : IPW regression adjustment
Outcome model : linear
Treatment model: logit
──────────────────────────────────────────────────────────────────────────────
| Robust
y | Coefficient std. err. z P&amp;gt;|z| [95% conf. interval]
─────────────+────────────────────────────────────────────────────────────────
ATE |
D |
(1 vs 0) | .1172686 .0322495 3.64 0.000 .0540608 .1804764
─────────────+────────────────────────────────────────────────────────────────
POmean |
D |
0 | 10.03361 .0171459 585.19 0.000 10 10.06722
──────────────────────────────────────────────────────────────────────────────
&lt;/code>&lt;/pre>
&lt;p>The doubly robust estimate of the ATE of receipt is 0.117 (SE = 0.032, 95% CI [0.054, 0.180]). This is slightly lower than the &lt;code>etregress&lt;/code> estimate (0.147) and closer to the true effect of 0.12. The wider standard error (0.032 vs. 0.025) reflects the additional flexibility of the doubly robust approach. This specification includes &lt;code>y0&lt;/code> (the baseline outcome) in the outcome model, which controls for pre-treatment differences in consumption levels. The variable &lt;code>treat&lt;/code> appears in the treatment model for &lt;code>D&lt;/code> because random assignment is the strongest predictor of receipt.&lt;/p>
&lt;p>The diagnostic graphs below verify adequate covariate balance and propensity score overlap for the receipt model.&lt;/p>
&lt;p>&lt;img src="stata_rct_density_y0_receipt.png" alt="Density plot of baseline consumption (y0) for receivers and non-receivers, before and after IPWRA weighting.">&lt;/p>
&lt;p>&lt;img src="stata_rct_overlap_receipt.png" alt="Overlap plot showing propensity score distributions for receivers and non-receivers of the cash transfer.">&lt;/p>
&lt;p>The density and overlap plots confirm that the IPWRA weighting achieves good balance between receivers and non-receivers. After weighting, the effective sample sizes are approximately 999 treated and 1,001 control (rebalanced from the raw 923 receivers and 1,077 non-receivers). The weighted covariate means are closely aligned &amp;mdash; for example, the weighted mean age is 35.0 for receivers versus 35.2 for non-receivers, and the weighted poverty rate is 31.1% versus 31.4%. The propensity scores show sufficient overlap for reliable estimation.&lt;/p>
&lt;hr>
&lt;h2 id="11-comparing-all-estimates-----the-big-picture">11. Comparing all estimates &amp;mdash; the big picture&lt;/h2>
&lt;p>The table below brings together all estimates from the tutorial, providing a comprehensive overview of how different methods, estimands, and data structures relate to each other.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>#&lt;/th>
&lt;th>Method&lt;/th>
&lt;th>Approach&lt;/th>
&lt;th>Estimand&lt;/th>
&lt;th>Data&lt;/th>
&lt;th style="text-align:center">Estimate&lt;/th>
&lt;th style="text-align:center">SE&lt;/th>
&lt;th style="text-align:center">95% CI&lt;/th>
&lt;th style="text-align:center">Contains 0.12?&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>Simple regression&lt;/td>
&lt;td>None&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.116&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.078, 0.154]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>Regression Adjustment&lt;/td>
&lt;td>Outcome model&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4&lt;/td>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>5&lt;/td>
&lt;td>Inverse Prob. Weighting&lt;/td>
&lt;td>Treatment model&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>6&lt;/td>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.075, 0.150]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>7&lt;/td>
&lt;td>IPWRA (Doubly Robust)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.113&lt;/td>
&lt;td style="text-align:center">0.019&lt;/td>
&lt;td style="text-align:center">[0.076, 0.151]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>8&lt;/td>
&lt;td>Basic DiD&lt;/td>
&lt;td>Panel FE&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.135&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.081, 0.188]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>9&lt;/td>
&lt;td>DR-DiD (&lt;code>drdid&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10&lt;/td>
&lt;td>DR-DiD (&lt;code>xthdidregress&lt;/code>)&lt;/td>
&lt;td>Both + Panel&lt;/td>
&lt;td>ATT (offer)&lt;/td>
&lt;td>Panel&lt;/td>
&lt;td style="text-align:center">0.137&lt;/td>
&lt;td style="text-align:center">0.027&lt;/td>
&lt;td style="text-align:center">[0.084, 0.191]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>11&lt;/td>
&lt;td>Endogenous treatment (&lt;code>etregress&lt;/code>)&lt;/td>
&lt;td>IV&lt;/td>
&lt;td>ATE (receipt)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.147&lt;/td>
&lt;td style="text-align:center">0.025&lt;/td>
&lt;td style="text-align:center">[0.099, 0.195]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>12&lt;/td>
&lt;td>DR receipt (&lt;code>teffects ipwra&lt;/code>)&lt;/td>
&lt;td>Both models&lt;/td>
&lt;td>ATE (receipt)&lt;/td>
&lt;td>Endline&lt;/td>
&lt;td style="text-align:center">0.117&lt;/td>
&lt;td style="text-align:center">0.032&lt;/td>
&lt;td style="text-align:center">[0.054, 0.180]&lt;/td>
&lt;td style="text-align:center">Yes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;strong>True effect&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td style="text-align:center">&lt;strong>0.12&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="four-key-takeaways">Four key takeaways&lt;/h3>
&lt;p>&lt;strong>1. RA vs. IPW vs. DR.&lt;/strong> In this well-designed RCT, all three cross-sectional approaches give remarkably similar results (0.113&amp;ndash;0.116). This convergence occurs because randomization ensures that both the outcome model and the propensity score model are approximately correct. The differences are small &amp;mdash; but in observational studies, where one model might be misspecified, the choice of method matters much more. Doubly robust methods are the safest bet because they remain consistent if either model is correct.&lt;/p>
&lt;p>&lt;strong>2. ATE vs. ATT.&lt;/strong> For all cross-sectional methods, ATE and ATT are nearly identical (0.113&amp;ndash;0.116). This confirms that treatment effects are roughly homogeneous across households in this simulation. When treatment effects are heterogeneous &amp;mdash; for example, if the program benefits poorer households more &amp;mdash; ATE and ATT can diverge. The researcher must choose the estimand that matches their policy question: ATE for scaling decisions, ATT for program evaluation.&lt;/p>
&lt;p>&lt;strong>3. Cross-sectional vs. DiD.&lt;/strong> DiD estimates (0.135&amp;ndash;0.137) are slightly higher than cross-sectional estimates (0.113&amp;ndash;0.116), but all confidence intervals contain the true effect of 0.12. DiD&amp;rsquo;s main advantage is controlling for &lt;strong>time-invariant unobservable&lt;/strong> household characteristics &amp;mdash; less important in an RCT (where randomization handles confounding) but critical in quasi-experimental settings. DRDID extends the doubly robust logic to the panel setting, providing the most robust estimator in our toolkit. DiD inherently estimates the &lt;strong>ATT&lt;/strong> because its counterfactual is constructed specifically for the treated group.&lt;/p>
&lt;p>&lt;strong>4. Offer vs. receipt.&lt;/strong> The effect of actually receiving the cash transfer (0.117&amp;ndash;0.147) is larger than the effect of being offered it (0.113&amp;ndash;0.116), because imperfect compliance dilutes the offer-based estimates. The doubly robust receipt estimate (0.117) is closest to the true effect of 0.12, while the endogenous treatment model (0.147) is slightly higher. All confidence intervals contain 0.12.&lt;/p>
&lt;hr>
&lt;h2 id="12-summary-and-key-takeaways">12. Summary and key takeaways&lt;/h2>
&lt;p>The cash transfer program increased household consumption by approximately &lt;strong>11&amp;ndash;14%&lt;/strong> across all estimation methods, close to the true effect of &lt;strong>12%&lt;/strong>. Every confidence interval contained the true value, demonstrating that all methods successfully recovered the correct answer.&lt;/p>
&lt;h3 id="seven-methodological-lessons">Seven methodological lessons&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Always verify baseline balance&lt;/strong> before estimating treatment effects. Even with randomization, chance imbalances can occur &amp;mdash; as we saw with the gender variable (SMD = 9.3%).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Be explicit about your estimand.&lt;/strong> ATE answers the policymaker&amp;rsquo;s question (&amp;ldquo;What if we scale this up?&amp;quot;), while ATT answers the evaluator&amp;rsquo;s question (&amp;ldquo;Did it help the participants?&amp;quot;). Different methods target different estimands.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Regression adjustment models the outcome; IPW models treatment assignment; doubly robust does both.&lt;/strong> These three approaches represent fundamentally different strategies for causal estimation. Understanding what each models &amp;mdash; and what can go wrong &amp;mdash; is essential for choosing the right method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>In a well-designed RCT, all three approaches converge.&lt;/strong> But doubly robust methods provide insurance against model misspecification, making them the standard recommendation in modern causal inference.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Panel data controls for time-invariant unobservables&lt;/strong> that cross-sectional methods cannot address. By comparing each household to itself over time, DiD absorbs household fixed effects &amp;mdash; motivation, geography, family culture &amp;mdash; that are invisible to cross-sectional approaches.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>DiD inherently estimates the ATT&lt;/strong> because its counterfactual is specific to the treated group. The control group&amp;rsquo;s time trend provides a counterfactual for what the treated group would have experienced without the program &amp;mdash; but it does not tell us what would happen if the program were given to the untreated.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Doubly robust DiD (DRDID)&lt;/strong> extends the DR logic to the panel setting. It combines the power of DiD (controlling for household fixed effects) with the robustness of doubly robust estimation (protection against model misspecification), making it the most robust panel estimator available.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="limitations">Limitations&lt;/h3>
&lt;ul>
&lt;li>This tutorial uses &lt;strong>simulated data&lt;/strong> with known parameters. Real-world data may exhibit more complex compliance patterns, heterogeneous effects, and missing data.&lt;/li>
&lt;li>The panel has only &lt;strong>two periods&lt;/strong> (baseline and endline), limiting our ability to test for pre-treatment trends or estimate dynamic treatment effects.&lt;/li>
&lt;li>Treatment effects are &lt;strong>homogeneous&lt;/strong> by construction. In practice, researchers should explore heterogeneity across subgroups.&lt;/li>
&lt;/ul>
&lt;h3 id="next-steps">Next steps&lt;/h3>
&lt;ul>
&lt;li>Apply these methods to &lt;strong>real-world RCT data&lt;/strong> from actual cash transfer programs&lt;/li>
&lt;li>Explore &lt;strong>heterogeneous treatment effects&lt;/strong> by gender, poverty status, or education level&lt;/li>
&lt;li>Extend to &lt;strong>multi-period panels&lt;/strong> with staggered treatment adoption, using modern DiD methods (Callaway and Sant&amp;rsquo;Anna, 2021)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="13-exercises">13. Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Heterogeneous effects by gender.&lt;/strong> Estimate treatment effects separately for male-headed and female-headed households using IPWRA. Are the effects different? Does ATE still equal ATT when you restrict to subgroups?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Model misspecification.&lt;/strong> Compare the RA, IPW, and DR estimates when you deliberately misspecify the outcome model by omitting &lt;code>edu&lt;/code> and &lt;code>age&lt;/code> from the covariate list. Which method is most robust to this misspecification? What does this tell you about the value of doubly robust estimation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Basic DiD vs. doubly robust DiD.&lt;/strong> Re-run the DiD analysis using the basic &lt;code>xtdidregress&lt;/code> command (no covariates) and compare it with the &lt;code>drdid&lt;/code> results (with covariates). How much do the estimates differ? What does this tell you about the role of covariate adjustment in DiD?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h2 id="references">References&lt;/h2>
&lt;ol>
&lt;li>&lt;a href="https://www.stata.com/manuals/teteffects.pdf" target="_blank" rel="noopener">Stata &lt;code>teffects&lt;/code> documentation &amp;mdash; Treatment-effects estimation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1016/j.jeconom.2020.06.003" target="_blank" rel="noopener">Sant&amp;rsquo;Anna, P.H.C. &amp;amp; Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. &lt;em>Journal of Econometrics&lt;/em>, 219(1), 101&amp;ndash;122&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://doi.org/10.1017/CBO9781139025751" target="_blank" rel="noopener">Imbens, G. &amp;amp; Rubin, D. (2015). &lt;em>Causal Inference for Statistics, Social, and Biomedical Sciences&lt;/em>. Cambridge University Press&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://friosavila.github.io/stpackages/drdid.html" target="_blank" rel="noopener">Rios-Avila, F., Sant&amp;rsquo;Anna, P.H.C., &amp;amp; Callaway, B. &lt;code>drdid&lt;/code> &amp;mdash; Doubly Robust DID estimators for Stata&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://dimewiki.worldbank.org/iebaltab" target="_blank" rel="noopener">World Bank &lt;code>ietoolkit&lt;/code> / &lt;code>iebaltab&lt;/code> documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tdmize.github.io/data/" target="_blank" rel="noopener">Mize, T. &lt;code>balanceplot&lt;/code> &amp;mdash; Stata module for covariate balance visualization&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://youtu.be/Gr_fu5deDMk" target="_blank" rel="noopener">RCT Analysis: Cash Transfers, Panel Data, and Doubly Robust Estimation (YouTube)&lt;/a>&lt;/li>
&lt;/ol>
&lt;h4 id="acknowledgements">Acknowledgements&lt;/h4>
&lt;p>AI tools (Claude Code, Gemini, NotebookLM) were used to make the contents of this post more accessible to students. Nevertheless, the content in this post may still have errors. Caution is needed when applying the contents of this post to true research projects.&lt;/p></description></item></channel></rss>