Regression Discontinuity Design (RDD) in Stata: Evaluating a Tutoring Program

Thu, 23 Apr 2026 00:00:00 +0000

1. Overview

Educational institutions constantly strive to improve student outcomes and close achievement gaps. One common strategy is to provide targeted tutoring to students who are struggling academically. But does tutoring actually work? And how can we rigorously measure its effect when we cannot randomly assign students to receive it?

In this tutorial, we evaluate a school tutoring program using regression discontinuity design (RDD) — one of the most credible quasi-experimental methods available. A school district administered a standardized entrance exam to all students and automatically enrolled anyone who scored 70 or below into a free tutoring program. Students above the cutoff received no tutoring. Because of this sharp, rule-based assignment, students who scored just below 70 are nearly identical to those who scored just above — they just happened to fall on different sides of the threshold. By comparing outcomes for students near the cutoff, we can estimate the causal effect of tutoring on end-of-year exit exam scores.

The central question is: Did the tutoring program improve student performance on the exit exam? RDD allows us to answer this question credibly because the assignment rule creates a natural experiment at the cutoff.

graph LR
A["Entrance Exam<br/>Score"] --> B{"Score <= 70?"}
B -- Yes --> C["Tutoring<br/>Program"]
B -- No --> D["No Tutoring"]
C --> E["Exit Exam<br/>Score"]
D --> E
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#d97757,stroke:#141413,color:#fff
style C fill:#00d4c8,stroke:#141413,color:#fff
style D fill:#141413,stroke:#141413,color:#fff
style E fill:#6a9bcc,stroke:#141413,color:#fff

The diagram above shows the assignment mechanism. The entrance exam score is the running variable — the variable that determines treatment. The cutoff at 70 creates a sharp boundary: everyone below gets tutoring, everyone above does not. This sharp rule is what makes the design credible.

Learning objectives

By the end of this tutorial, you will be able to:

Understand what regression discontinuity design is and when it applies
Verify whether an RDD is sharp or fuzzy using cross-tabulations
Estimate the local average treatment effect (LATE) using both parametric OLS and nonparametric methods
Assess the validity of an RDD using density tests, placebo cutoffs, and bandwidth sensitivity
Implement the rdrobust and rddensity packages in Stata

2. Key concepts

Before diving into the analysis, let us define the key concepts.

Running variable. The variable that determines treatment assignment — in our case, the entrance exam score. The running variable must be continuous (or at least finely graded) so that students near the cutoff are comparable.

Cutoff. The threshold value of the running variable that separates treated from untreated units. Here, the cutoff is 70 points.

Sharp vs. fuzzy design. In a sharp RDD, treatment is a deterministic function of the running variable at the cutoff: everyone below gets treated, everyone above does not. In a fuzzy RDD, the cutoff only changes the probability of treatment — some units may cross over. Our design is sharp.

Local average treatment effect (LATE). RDD identifies the causal effect only for units at the cutoff — those who scored exactly 70. We cannot extrapolate this effect to students who scored 30 or 95. Think of it as measuring the treatment effect through a narrow window around the threshold.

Identifying assumption. RDD requires that potential outcomes — what a student would score on the exit exam with or without tutoring — are smooth (continuous) through the cutoff. In plain language: absent the tutoring program, students who scored 69 and 71 would have performed similarly on the exit exam. If students could precisely manipulate their entrance exam scores to fall below 70, this assumption would fail.

3. Analytical roadmap

Our analysis follows a “see it, verify it, estimate it, stress-test it” logic:

graph TD
A["Load data &<br/>explore"] --> B["Verify sharp<br/>design"]
B --> C["Visualize the<br/>discontinuity"]
C --> D["Parametric<br/>estimation (OLS)"]
D --> E["Nonparametric<br/>estimation (rdrobust)"]
E --> F["Robustness<br/>checks"]
F --> G["Conclusions"]
style A fill:#6a9bcc,stroke:#141413,color:#fff
style B fill:#6a9bcc,stroke:#141413,color:#fff
style C fill:#6a9bcc,stroke:#141413,color:#fff
style D fill:#d97757,stroke:#141413,color:#fff
style E fill:#d97757,stroke:#141413,color:#fff
style F fill:#00d4c8,stroke:#141413,color:#fff
style G fill:#00d4c8,stroke:#141413,color:#fff

We start by understanding the data and verifying the sharp design. Then we visualize the discontinuity to build intuition before any estimation. Next, we estimate the treatment effect using both parametric (OLS) and nonparametric (rdrobust) methods. Finally, we stress-test the results with bandwidth sensitivity analysis, kernel comparisons, a McCrary density test, and placebo cutoff tests.

4. Data loading and exploration

We begin by loading the data and examining the key variables. The dataset contains 1,000 students with their entrance exam scores, exit exam scores, and tutoring status.

* Import data
use "https://github.com/quarcs-lab/data-open/raw/master/isds/tutoring.dta", clear
des
sum

Contains data from tutoring.dta
Observations: 1,000
Variables: 5
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
id int %8.0g ID of student
entrance_exam float %9.0g Entrance exam score
tutoring_text str8 %9s Enrolled in the tutoring program?
exit_exam float %9.0g Exit exam score
tutoring float %9.0g Enrolled in the tutoring program?
(Yes = 1, No=0)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
id | 1,000 500.5 288.8194 1 1000
entrance_e~m | 1,000 78.1427 12.7265 28.8 99.8
exit_exam | 1,000 66.1646 7.625894 42.8 84.5
tutoring | 1,000 .241 .4279043 0 1

The dataset contains 1,000 students with entrance exam scores ranging from 28.8 to 99.8 (mean = 78.1, SD = 12.7) and exit exam scores ranging from 42.8 to 84.5 (mean = 66.2, SD = 7.6). Of the 1,000 students, 241 (24.1%) were enrolled in the tutoring program. The entrance exam distribution is right-skewed — most students scored above the 70-point cutoff, which explains why only about a quarter of the sample received tutoring.

We also create two helper variables: a numeric treatment indicator (treat) and a centered version of the running variable (centered). Centering at the cutoff makes the treatment coefficient directly interpretable as the discontinuity jump.

* Create treatment indicator and center the running variable
clonevar treat = tutoring
gen centered = entrance_exam - 70

Now let us verify whether the design is truly sharp.

5. Verifying the sharp design

In a sharp RDD, treatment assignment is a deterministic function of the running variable at the cutoff. Every student at or below 70 must receive tutoring, and every student above 70 must not. Any deviation would make this a fuzzy RDD, requiring a different estimation strategy.

* Cross-tabulate treatment by position relative to cutoff
gen byte below_cutoff = (entrance_exam <= 70)
tab below_cutoff treat, row

 Scored at |
or below | Tutoring program (1 =
cutoff | Yes, 0 = No)
(70) | 0 1 | Total
-----------+----------------------+----------
0 | 759 0 | 759
| 100.00 0.00 | 100.00
-----------+----------------------+----------
1 | 0 241 | 241
| 0.00 100.00 | 100.00
-----------+----------------------+----------
Total | 759 241 | 1,000
| 75.90 24.10 | 100.00

The cross-tabulation confirms perfect compliance: 100% of students at or below the cutoff (n = 241) received tutoring, and 100% of students above the cutoff (n = 759) did not. There are zero crossovers in either direction, confirming that this is a sharp RDD. This is the strongest possible form of the design — treatment status is completely determined by the entrance exam score, with no exceptions.

With the sharp design confirmed, let us now visualize the discontinuity in outcomes.

6. Visualizing the discontinuity

6.1 Raw scatter plot

The signature visualization in any RDD is a scatter plot of the outcome against the running variable, with a vertical line at the cutoff. If the tutoring program has an effect, we should see a jump in exit exam scores at the cutoff.

twoway (scatter exit_exam entrance_exam if treat==1, ///
mcolor("106 155 204") msize(small) msymbol(circle)) ///
(scatter exit_exam entrance_exam if treat==0, ///
mcolor("217 119 87") msize(small) msymbol(circle)), ///
xline(70, lcolor(black) lwidth(medium) lpattern(dash)) ///
legend(order(1 "Tutored" 2 "Not tutored") position(5) ring(0) col(1)) ///
title("Exit Exam Scores by Entrance Exam Score") ///
xtitle("Entrance Exam Score") ytitle("Exit Exam Score")

Figure 1: Exit exam scores by entrance exam score. Blue dots are tutored students (score ≤ 70), orange dots are non-tutored. The dashed line marks the cutoff at 70.

The scatter plot reveals a clear pattern: tutored students (blue dots, left of cutoff) tend to score higher on the exit exam than what the non-tutored trend (orange dots, right of cutoff) would predict at the same entrance exam levels. Both groups show a positive relationship between entrance and exit scores — higher entrance scores predict higher exit scores — but there is a visible upward shift for the tutored group near the cutoff.

6.2 RD plot with fitted lines

To see the discontinuity more clearly, we use rdplot from the rdrobust package. This command creates a binned scatter plot with local polynomial fits on each side of the cutoff. Think of it as a cleaned-up version of the raw scatter that smooths out individual variation.

ssc install rdrobust, replace
rdplot exit_exam entrance_exam, c(70) p(1)

Figure 2: RD plot with binned averages and local linear fits. The downward jump at the cutoff reveals the tutoring effect.

The RD plot makes the discontinuity unmistakable. The binned averages (dots) follow a clear upward trend on both sides of the cutoff, but there is a sharp downward jump at 70 when moving from left to right. The local linear fits (lines) show that tutored students near the cutoff score about 8–10 points higher than what the non-tutored trend would predict. This visual evidence strongly suggests that the tutoring program has a meaningful positive effect.

Why is the jump downward? Because the plot moves left to right: from the tutored group (higher scores due to tutoring) to the non-tutored group. The jump down means tutored students score higher than non-tutored students at the cutoff.

Next, we move from visual evidence to formal estimation.

7. Parametric estimation (OLS)

The simplest way to estimate the RDD treatment effect is with ordinary least squares (OLS). We regress the exit exam score on the entrance exam score and a treatment indicator. The coefficient on the treatment indicator captures the jump in exit scores at the cutoff.

7.1 Simple linear model

This model assumes the same linear slope on both sides of the cutoff:

$$ \text{exit}_i = \beta_0 + \beta_1 \cdot \text{entrance}_i + \tau \cdot \text{treat}_i + \varepsilon_i $$

In words, this says: a student’s exit exam score depends on their entrance exam score (the slope $\beta_1$) plus a jump of $\tau$ points if they received tutoring. The coefficient $\tau$ is our estimate of the treatment effect.

reg exit_exam entrance_exam treat, robust

Linear regression Number of obs = 1,000
F(2, 997) = 199.06
Prob > F = 0.0000
R-squared = 0.2685
Root MSE = 6.5288
------------------------------------------------------------------------------
| Robust
exit_exam | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
entrance_e~m | .5097654 .0260511 19.57 0.000 .4586441 .5608868
treat | 10.80043 .8063233 13.39 0.000 9.21815 12.38272
_cons | 23.72725 2.202253 10.77 0.000 19.40566 28.04883
------------------------------------------------------------------------------

The simple linear model estimates that tutoring raises exit exam scores by 10.80 points (95% CI: 9.22 to 12.38, p < 0.001). The entrance exam slope of 0.51 means that each additional point on the entrance exam is associated with about half a point higher on the exit exam. The model explains 26.9% of the variation in exit exam scores (R-squared = 0.2685). This estimate uses the full sample of 1,000 students and assumes a common linear relationship between entrance and exit exam scores on both sides of the cutoff.

7.2 Allowing different slopes

The simple model forces the same slope on both sides of the cutoff. A more flexible specification allows different slopes by interacting the centered running variable with the treatment indicator. We also fit a quadratic specification:

* Model 2: Different slopes on each side
gen interact = centered * treat
reg exit_exam centered treat interact, robust
estimates store m2_interact
* Model 3: Quadratic specification
gen centered2 = centered^2
reg exit_exam centered centered2 treat ///
c.centered#c.treat c.centered2#c.treat, robust
estimates store m3_quadratic

7.3 Comparing parametric models

estimates table m1_linear m2_interact m3_quadratic, ///
b(%9.3f) se(%9.3f) stats(r2 N)

--------------------------------------------------
Variable | m1_linear m2_inte~t m3_quad~c
-------------+------------------------------------
treat | 10.800 10.797 9.223
| 0.806 0.816 1.198
centered | 0.510 0.328
| 0.032 0.125
interact | -0.001
| 0.055
centered2 | 0.007
| 0.004
-------------+------------------------------------
r2 | 0.268 0.268 0.271
N | 1000 1000 1000
--------------------------------------------------
Legend: b/se

The three parametric specifications tell a consistent story. Model 1 (same slope) and Model 2 (different slopes) produce nearly identical treatment effects of 10.800 and 10.797 points, respectively. The interaction term in Model 2 is essentially zero (-0.001, p = 0.98), indicating that the relationship between entrance and exit exams has the same slope on both sides of the cutoff. Model 3 (quadratic) gives a somewhat lower estimate of 9.22 points with a wider confidence interval (SE = 1.20 vs 0.81). All three R-squared values are virtually identical (0.268–0.271), suggesting that higher-order polynomials add no meaningful explanatory power.

Model	Specification	Treatment effect	SE	R-squared
1	Linear, same slope	10.800	0.806	0.268
2	Linear, different slopes	10.797	0.816	0.268
3	Quadratic	9.223	1.198	0.271

However, parametric models impose functional form assumptions on the entire sample. The nonparametric approach, which we turn to next, avoids these assumptions by focusing only on observations near the cutoff.

8. Nonparametric estimation with rdrobust

The rdrobust package implements the data-driven, nonparametric RDD estimation procedure from Cattaneo, Idrobo, and Titiunik (2019). Instead of fitting a regression through the entire sample, rdrobust focuses on observations within a bandwidth — a narrow window around the cutoff — and fits local polynomials on each side. The bandwidth is chosen automatically to minimize mean squared error.

$$ \hat{\tau}_{RD} = \lim_{x \downarrow c} E[Y_i | X_i = x] - \lim_{x \uparrow c} E[Y_i | X_i = x] $$

In words, the RD estimator measures the difference between the expected outcome just to the right of the cutoff ($c$) and just to the left. If there is a jump in outcomes at the cutoff, that jump is the treatment effect.

rdrobust exit_exam entrance_exam, c(70)

Sharp RD estimates using local polynomial regression.
Cutoff c = 70 | Left of c Right of c Number of obs = 1000
-------------------+---------------------- BW type = mserd
Number of obs | 237 763 Kernel = Triangular
Eff. Number of obs | 144 256 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 9.984 9.984
BW bias (b) | 14.578 14.578
rho (h/b) | 0.685 0.685
| Point | Robust Inference
| Estimate | z-stat P>|z| [95% Conf. Interval]
-------------------+--------------------------------------------------------------
RD Effect | -8.5793 | -4.3034 0.000 -12.1422 -4.54297

The nonparametric estimator finds an RD effect of -8.58 points (robust 95% CI: -12.14 to -4.54, p < 0.001). The MSE-optimal bandwidth is 9.98 points, meaning only students who scored between 60 and 80 on the entrance exam are used in the estimation — 400 students in total (144 below the cutoff, 256 above).

Why is the sign negative? The rdrobust convention estimates the jump from left to right across the cutoff. Since tutored students (left side, below 70) score higher than non-tutored students (right side, above 70), the jump going rightward is downward — hence negative. This is the same finding as the positive parametric coefficient on treat (+10.80): both tell us that tutoring improves scores. The magnitude is slightly smaller (8.6 vs 10.8) because rdrobust focuses only on students near the cutoff rather than the full sample.

To verify that the result is not sensitive to the choice of kernel weighting function, we also estimate with uniform and Epanechnikov kernels:

Kernel	Bandwidth	RD effect	95% CI
Triangular	9.984	-8.579	[-12.142, -4.543]
Uniform	7.223	-8.200	[-11.775, -4.049]
Epanechnikov	8.179	-8.388	[-12.175, -4.197]

All three kernels produce estimates between -8.20 and -8.58, all significant at p < 0.001. The choice of kernel function has minimal impact on the results.

9. Robustness and validity checks

A credible RDD requires more than just a significant estimate. We need to verify the assumptions underlying the design. This section presents four robustness checks: bandwidth sensitivity, density testing, and placebo cutoff analysis.

9.1 Bandwidth sensitivity

A key concern in RDD is that results might depend on the analyst’s choice of bandwidth. If the estimate changes dramatically when the bandwidth is widened or narrowed, the finding is fragile. We estimate the RD effect at bandwidths ranging from 5 to 20 points around the cutoff:

foreach bw in 5 7 10 12 15 20 {
quietly rdrobust exit_exam entrance_exam, c(70) h(`bw')
di "`bw'" _col(10) %9.3f e(tau_cl) _col(23) %9.3f e(se_tau_cl)
}

BW Coef SE p-value
---- --------- --------- ---------
5 -8.202 2.337 0.000
7 -8.237 1.919 0.000
10 -8.581 1.615 0.000
12 -8.675 1.486 0.000
15 -8.842 1.312 0.000
20 -9.157 1.131 0.000

The estimate is remarkably stable: it ranges from -8.20 (BW = 5) to -9.16 (BW = 20), a spread of less than 1 point. All estimates are significant at p < 0.001, even at the narrowest bandwidth of 5 points where only a handful of students are used. As expected, narrower bandwidths produce larger standard errors (2.34 at BW = 5 vs 1.13 at BW = 20) because they use fewer observations. This stability is strong evidence that the estimated effect is genuine.

9.2 McCrary density test

The key identifying assumption of RDD is that students cannot precisely manipulate their entrance exam scores to fall on a specific side of the cutoff. If students could — for example, by deliberately scoring below 70 to qualify for tutoring — we would see unusual bunching in the distribution of entrance scores near 70.

The McCrary density test, implemented via the rddensity package, formally tests whether the density of the running variable is continuous at the cutoff:

ssc install rddensity, replace
rddensity entrance_exam, c(70)

RD Manipulation test using local polynomial density estimation.
c = 70.000 | Left of c Right of c
-------------------+----------------------
Number of obs | 237 763
Eff. Number of obs | 208 577
BW est. (h) | 22.444 19.966
Method | T P>|T|
-------------------+----------------------
Robust | -0.5521 0.5809

The density test yields a p-value of 0.58, providing no evidence that students manipulated their entrance exam scores. Under the null hypothesis that the density is continuous at the cutoff, we would expect a p-value near 0.50 by chance — and 0.58 is well within the expected range. This strongly supports the validity of the research design.

We can visualize this by plotting kernel density estimates separately for each side of the cutoff:

Figure 3: Density of entrance exam scores on each side of the cutoff. Similar heights at the threshold indicate no manipulation (McCrary test p = 0.58).

The two density curves approach the cutoff at similar heights, visually confirming the absence of bunching. There is no pile-up of students just below 70, which would be the tell-tale sign of manipulation.

We can also check the histogram of the raw entrance exam scores:

Figure 4: Distribution of entrance exam scores. No bunching or heaping is visible near the 70-point cutoff.

The histogram shows a smooth transition through the cutoff with no visible spike or dip near 70. Together, the formal test (p = 0.58), the density plot, and the histogram all support the no-manipulation assumption.

9.3 Placebo cutoff tests

If the tutoring effect is genuine, the discontinuity should appear only at the true cutoff of 70. If we test for discontinuities at other values (50, 55, 60, 65, 75, 80, 85, 90), we should find nothing significant — there is no reason for a jump in exit scores at, say, 55 or 85.

foreach c in 50 55 60 65 70 75 80 85 90 {
quietly rdrobust exit_exam entrance_exam, c(`c')
di "`c'" _col(10) %9.3f e(tau_cl) _col(23) %9.3f e(pv_cl)
}

Cutoff Coef SE p-value
------ --------- --------- ---------
50 12.728 21.302 0.550
55 0.557 3.052 0.855
60 0.569 3.193 0.859
65 3.296 1.742 0.058
70 * -8.579 1.617 0.000
75 -1.548 1.691 0.360
80 -1.095 1.472 0.457
85 0.817 1.605 0.611
90 -0.540 1.900 0.776

Figure 5: Placebo cutoff test. Only the true cutoff at 70 (orange diamond) shows a significant effect; all placebo cutoffs straddle zero.

The results are unambiguous: the true cutoff of 70 is the only value with a significant discontinuity (p < 0.001). All eight placebo cutoffs have p-values well above 0.05, ranging from 0.058 (cutoff 65) to 0.855 (cutoff 55). The marginally non-significant result at cutoff 65 (p = 0.058) likely reflects spillover from the true discontinuity — since the optimal bandwidth is about 10 points, the estimation windows for cutoff 65 and cutoff 70 overlap. The placebo cutoff figure makes this visually clear: only the orange diamond (true cutoff at 70) has a confidence interval that excludes zero.

10. Discussion

Returning to our central question: Did the tutoring program improve student performance on the exit exam?

The evidence overwhelmingly says yes. Across all estimation approaches, the tutoring program raised exit exam scores by approximately 9 to 11 points at the cutoff:

Approach	Estimate	95% CI	p-value
Parametric OLS (linear)	+10.80	[9.22, 12.38]	< 0.001
Parametric OLS (quadratic)	+9.22	[6.87, 11.57]	< 0.001
Nonparametric (rdrobust)	8.58	[4.54, 12.14]	< 0.001

This 9–11 point improvement represents roughly 13–16% of the mean exit exam score (66.2 points) — a substantial educational gain.

The design passes all validity checks. The RDD is perfectly sharp (100% compliance), the McCrary density test shows no evidence of score manipulation (p = 0.58), the estimate is stable across bandwidths (ranging from -8.20 to -9.16 across BW = 5 to 20), robust to kernel choice (triangular, uniform, Epanechnikov all yield similar results), and placebo cutoff tests confirm that the discontinuity is unique to the true cutoff of 70.

For policymakers, these results suggest that rule-based tutoring programs — where eligibility is determined by a test score cutoff — can be effective. An improvement of 9–11 points on an exit exam is meaningful: it could be the difference between passing and failing, or between qualifying for an advanced program and being held back. The sharp rule-based assignment also makes the program straightforward to implement and evaluate.

However, a key limitation of RDD is that the estimated effect is local to the cutoff. We know that tutoring helps students who scored near 70, but we cannot say whether it would help a student who scored 30 or 90. Extrapolating the RDD estimate to the full population of students would require additional assumptions about how the treatment effect varies across the score distribution.

11. Summary and next steps

Key takeaways

Tutoring works. The program raised exit exam scores by 9–11 points at the cutoff (13–16% of the mean), a substantively large effect that is robust across all specifications.
Sharp RDD is credible here. The assignment rule is perfectly enforced (100% compliance), and all validity checks pass — no manipulation (density test p = 0.58), no spurious discontinuities at placebo cutoffs, and stable estimates across bandwidths.
Parametric and nonparametric methods agree. OLS estimates the effect at 10.80 points (full sample), while rdrobust estimates it at 8.58 points (local to the cutoff). The difference reflects scope (global vs. local) rather than disagreement.
The rdrobust package makes RDD accessible. Data-driven bandwidth selection, robust inference, and built-in plotting tools reduce the number of arbitrary analyst choices.

Limitations

The effect is local to the cutoff (LATE), not generalizable to all students
No covariates are available for additional smoothness checks
We cannot test for heterogeneous effects by student characteristics

Next steps

Fuzzy RDD. Apply the methods to settings where compliance is imperfect (e.g., students can opt out of the program)
Covariates. Incorporate pre-treatment variables to improve precision and run covariate smoothness tests
Heterogeneity. Explore whether the tutoring effect varies by student characteristics using subgroup analysis

12. Exercises

Alternative polynomials. Re-estimate the parametric model using cubic and quartic polynomials. Do the treatment effect estimates change substantially? What happens to the standard errors as the polynomial order increases?
Asymmetric bandwidths. Run rdrobust with different bandwidths on each side of the cutoff (e.g., h(8 12)). Does allowing asymmetric bandwidths change the estimate or improve precision?
Donut hole RDD. Some researchers worry that observations exactly at the cutoff are unusual. Re-estimate the effect after dropping students who scored exactly 70 (a “donut hole” approach). Does the estimate change?

References

AI acknowledgement

This tutorial was written with the assistance of Claude (Anthropic), which helped draft the narrative, interpretations, and code documentation. All statistical analyses were executed in Stata 18.0, and all results were verified against the actual Stata output.

regression discontinuity | Carlos Mendez