<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Semiparametric Methods | Carlos Mendez</title><link>https://carlos-mendez.org/category/semiparametric-methods/</link><atom:link href="https://carlos-mendez.org/category/semiparametric-methods/index.xml" rel="self" type="application/rss+xml"/><description>Semiparametric Methods</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2018–2026 Carlos Mendez. All rights reserved.</copyright><lastBuildDate>Sun, 14 Jun 2026 00:00:00 +0000</lastBuildDate><image><url>https://carlos-mendez.org/media/icon_huedfae549300b4ca5d201a9bd09a3ecd5_79625_512x512_fill_lanczos_center_3.png</url><title>Semiparametric Methods</title><link>https://carlos-mendez.org/category/semiparametric-methods/</link></image><item><title>Spatial Inequality and the Kuznets Curve: Parametric and Semiparametric Estimates in R</title><link>https://carlos-mendez.org/post/r_kuznets/</link><pubDate>Sun, 14 Jun 2026 00:00:00 +0000</pubDate><guid>https://carlos-mendez.org/post/r_kuznets/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Why do some countries have huge gaps between their richest and poorest regions while others are remarkably even? Lessmann (2014) revisits a classic idea from Kuznets (1955) and Williamson (1965): as countries develop, spatial inequality first &lt;em>rises&lt;/em>, then &lt;em>falls&lt;/em> — an inverted-U. This tutorial replicates that study in R on a &lt;strong>synthetic&lt;/strong> dataset, so the entire data-generating process is open and reproducible. We simulate regional GDP per capita for 56 countries over 1980–2009, compute the population-weighted coefficient of variation (WCV) of regional income from those regions, and estimate the relationship with cross-section OLS, two-way fixed effects via &lt;code>fixest&lt;/code>, and the Robinson (1988) and Baltagi–Li (2002) semiparametric estimators. The cross-section recovers a significant inverted-U with a high-income upturn — a cubic whose turning points sit at about \$2,100 and \$31,000 of GDP per capita — while the within-country panel shows a &lt;em>clean&lt;/em> inverted-U (the cubic term is insignificant). Spatial inequality correlates with personal (Gini) inequality at about 0.32, and a sectoral channel — the non-agricultural share of output — reproduces the same curve. The practical lesson is that wide regional gaps are, to a first approximation, a transitional feature of development that tends to narrow as economies mature; for learners, the post is a hands-on tour of measurement, fixed effects, polynomial specification, and flexible semiparametric regression.&lt;/p>
&lt;h2 id="1-overview">1. Overview&lt;/h2>
&lt;p>&lt;strong>The case-study question.&lt;/strong> &lt;em>Is the link between spatial inequality and economic development an inverted-U — and does it turn N-shaped (rising again) at very high income?&lt;/em> Lessmann (2014) assembled a hard-to-find panel of regional accounts to answer it. We cannot share that proprietary data, so we do the next best thing for teaching: &lt;strong>build a synthetic world&lt;/strong> whose data-generating process reproduces the paper&amp;rsquo;s findings, and walk through every estimator on it.&lt;/p>
&lt;p>Why does this matter? Wide regional gaps are not just an accounting curiosity. Interregional inequality often travels with ethnic and political tension, and in extreme cases raises the risk of internal conflict. Understanding &lt;em>when&lt;/em> such gaps widen and &lt;em>when&lt;/em> they close is directly useful for regional policy.&lt;/p>
&lt;p>&lt;strong>Learning objectives.&lt;/strong> By the end you will be able to:&lt;/p>
&lt;ol>
&lt;li>Compute the &lt;strong>weighted coefficient of variation (WCV)&lt;/strong> of regional income and explain why it is population-weighted.&lt;/li>
&lt;li>Estimate &lt;strong>polynomial OLS&lt;/strong> with heteroskedasticity-robust (White) standard errors and read an inverted-U off the coefficients.&lt;/li>
&lt;li>Fit &lt;strong>two-way fixed effects&lt;/strong> with &lt;code>fixest::feols&lt;/code>, and explain why country and year fixed effects change the story.&lt;/li>
&lt;li>Solve for the &lt;strong>turning points&lt;/strong> of a cubic and convert them to dollar thresholds.&lt;/li>
&lt;li>Read the &lt;strong>Robinson&lt;/strong> and &lt;strong>Baltagi–Li&lt;/strong> semiparametric partial-fit curves and say how they differ from a polynomial.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">graph LR
A[&amp;quot;Simulate regional GDP&amp;quot;] --&amp;gt; B[&amp;quot;Compute WCV&amp;quot;]
B --&amp;gt; C[&amp;quot;Cross-section OLS&amp;lt;br/&amp;gt;Table 2&amp;quot;]
B --&amp;gt; D[&amp;quot;Two-way FE&amp;lt;br/&amp;gt;Table 3&amp;quot;]
C --&amp;gt; E[&amp;quot;Turning points&amp;quot;]
E --&amp;gt; J[&amp;quot;Discriminant test&amp;quot;]
C --&amp;gt; F[&amp;quot;Robinson semiparametric&amp;lt;br/&amp;gt;Fig 4&amp;quot;]
D --&amp;gt; G[&amp;quot;Baltagi–Li semiparametric&amp;lt;br/&amp;gt;Fig 5&amp;quot;]
B --&amp;gt; H[&amp;quot;Sectoral channel&amp;lt;br/&amp;gt;Table 6&amp;quot;]
B --&amp;gt; I[&amp;quot;Robustness&amp;quot;]
style A fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style J fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style B fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style C fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style D fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
style E fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style F fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
style G fill:#1f2b5e,stroke:#d97757,color:#e8ecf2
style H fill:#1f2b5e,stroke:#00d4c8,color:#e8ecf2
style I fill:#1f2b5e,stroke:#6a9bcc,color:#e8ecf2
&lt;/code>&lt;/pre>
&lt;p>The pipeline above is the whole post in one picture: simulate regions, compute the inequality index, then estimate the development–inequality relationship four ways (parametric and semiparametric, cross-section and panel), and probe the sectoral channel and robustness.&lt;/p>
&lt;h3 id="key-concepts-at-a-glance">Key concepts at a glance&lt;/h3>
&lt;p>The post reuses a small vocabulary. Each concept below has a &lt;strong>definition&lt;/strong> (always visible) plus an &lt;strong>example&lt;/strong> and &lt;strong>analogy&lt;/strong> behind clickable cards — open them when a term feels slippery.&lt;/p>
&lt;p>&lt;strong>1. Weighted coefficient of variation (WCV).&lt;/strong> $\mathrm{WCV} = \frac{1}{\bar{y}}\left[\sum_{j} p_j,(\bar{y}-y_j)^2\right]^{1/2}$ — the population-weighted spread of regional GDP per capita, divided by the country mean. Scale-free, so it compares countries of any income level.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>A country with a rich capital (\$28,000, 35% of people) and a poorer hinterland (\$12,000, 65%) has WCV ≈ 0.43.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Like a &amp;ldquo;spread score&amp;rdquo; for a class where bigger groups of students count more toward the average gap.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>2. Inverted-U / Kuznets curve.&lt;/strong> The hypothesis that inequality rises with development, peaks, then falls — tracing an upside-down U.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In §5 the quadratic gives a positive linear term and a negative squared term — the algebraic signature of an inverted-U.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>A roller-coaster hill: climb during industrialisation, crest, then descend as the modern economy spreads out.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>3. Between vs within variation.&lt;/strong> &lt;em>Between&lt;/em> compares different countries; &lt;em>within&lt;/em> compares one country with itself over time. Cross-section regressions use between variation; panel fixed effects use within variation.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>The high-income &lt;em>upturn&lt;/em> shows up between countries (§5) but vanishes within countries (§6) — the central contrast of the study.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Comparing different students&amp;rsquo; heights (between) vs tracking one student as they grow (within).&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>4. Two-way fixed effects (TWFE).&lt;/strong> Adding a dummy for every country &lt;em>and&lt;/em> every year, so the income effect is identified only from within-country, within-year variation.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>&lt;code>feols(wcv ~ lnGDP + I(lnGDP^2) | country + year)&lt;/code> — the &lt;code>| country + year&lt;/code> part absorbs both sets of dummies.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Grading each student against their own past, and against everyone&amp;rsquo;s average that semester — removing fixed advantages.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>5. Polynomial specification.&lt;/strong> Entering income as $Y, Y^2, Y^3$ lets a straight-line model bend into curves — quadratic for an inverted-U, cubic for an N-shape.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>Column (5) of Table 2 adds $Y^3$; its positive coefficient produces the high-income upturn.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Adding hinges to a ruler so it can follow a winding road instead of cutting straight across.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>6. Turning points.&lt;/strong> The income levels where the curve changes direction — found by setting the derivative to zero: $\beta_1 + 2\beta_2 Y + 3\beta_3 Y^2 = 0$.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>Our cubic peaks at ln(GDP) ≈ 7.7 (≈ \$2,100) and troughs at ≈ 10.4 (≈ \$31,000).&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>The crest and the valley of the roller-coaster — where the track is momentarily flat.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>7. Semiparametric / partially-linear model.&lt;/strong> $\mathrm{WCV} = \alpha + f(Y) + \gamma X + \epsilon$: the controls $X$ enter linearly, but the income effect $f(Y)$ is an unknown smooth curve estimated from the data instead of forced into a polynomial.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>The Robinson estimator (§8) and the Baltagi–Li B-spline (§9) draw $f(Y)$ as a flexible curve with a confidence band.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Tracing a coastline freehand instead of approximating it with a few straight rulers.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>8. Omitted-variable bias.&lt;/strong> When a left-out factor correlated with both income and inequality distorts the estimated relationship; fixed effects defend against the &lt;em>time-invariant&lt;/em> version of it.&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>Geography (mountains, coasts) drives spatial inequality but is hard to measure; country fixed effects absorb all of it at once.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>Blaming coffee for poor sleep when it&amp;rsquo;s really the late-night screen time that travels with it.&lt;/p>
&lt;/details>
&lt;/div>
&lt;p>&lt;strong>9. Discriminant of the cubic.&lt;/strong> A single number, $D = \beta_2^2 - 3\beta_1\beta_3$, that tells you whether a fitted cubic has two real turning points ($D&amp;gt;0$), one inflection ($D=0$), or none ($D&amp;lt;0$). It is computed from the coefficients, so it answers &amp;ldquo;does the curve bend?&amp;rdquo; — a different question from &amp;ldquo;is each term significant?&amp;rdquo;&lt;/p>
&lt;div class="concept-pair">
&lt;details class="concept-card concept-example">&lt;summary>Example&lt;/summary>
&lt;p>In §7 the cross-section cubic has $D = +0.0055 &amp;gt; 0$ with both turning points in range (genuine N-shape), while the panel cubic&amp;rsquo;s implied turning points fall outside the data.&lt;/p>
&lt;/details>
&lt;details class="concept-card concept-analogy">&lt;summary>Analogy&lt;/summary>
&lt;p>A road can curve gently yet never actually turn back; the discriminant is the test for whether it makes a genuine U-turn or just leans.&lt;/p>
&lt;/details>
&lt;/div>
&lt;h2 id="2-setup-and-the-synthetic-data-generating-process">2. Setup and the synthetic data-generating process&lt;/h2>
&lt;h3 id="21-packages-and-theme">2.1 Packages and theme&lt;/h3>
&lt;p>We lean on &lt;code>fixest&lt;/code> for fixed effects, &lt;code>np&lt;/code> for the Robinson estimator, &lt;code>splines&lt;/code> for the Baltagi–Li B-spline, &lt;code>sandwich&lt;/code>/&lt;code>lmtest&lt;/code> for White standard errors, and &lt;code>ggplot2&lt;/code> for dark-themed figures.&lt;/p>
&lt;pre>&lt;code class="language-r">set.seed(123)
pacman::p_load(dplyr, tidyr, ggplot2, scales, patchwork, fixest, sandwich, lmtest,
splines, np, modelsummary, gt, webshot2, gridExtra)
options(np.messages = FALSE)
&lt;/code>&lt;/pre>
&lt;h3 id="22-simulating-regional-gdp-and-a-country-panel">2.2 Simulating regional GDP and a country panel&lt;/h3>
&lt;p>The key design choice is that &lt;strong>the WCV is computed, not assumed&lt;/strong>. For each of 56 synthetic countries we build a realistic territorial structure — the actual number of regions and land areas from the paper&amp;rsquo;s appendix — then draw regional GDP per capita and a population share for each region. We engineer two layers into the data: a &lt;strong>within-country&lt;/strong> inverted-U (how a country&amp;rsquo;s regional spread evolves as it develops) and a &lt;strong>between-country&lt;/strong> cubic that lives in a time-invariant country term. This separation is what lets the panel show a clean inverted-U while the cross-section shows the N-shape.&lt;/p>
&lt;pre>&lt;code class="language-r"># region j in country i, year t: y_ijt = country_mean × exp(δ_it · z_ij)
# z_ij is a persistent regional &amp;quot;position&amp;quot; (a rich region stays rich);
# δ_it (the log-dispersion) follows the structural inverted-U in development.
delta &amp;lt;- sqrt(log(1 + target_wcv^2)) # lognormal-CV inversion
y_reg &amp;lt;- exp(lnGDP) * exp(delta * z - 0.5 * delta^2)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Simulating regional micro-data for 56 countries ...
annual obs N=890 | 5-year obs N=212 | cross-section N=56
&lt;/code>&lt;/pre>
&lt;p>The simulated panel has &lt;strong>890 annual observations&lt;/strong>, &lt;strong>212 five-year cells&lt;/strong>, and &lt;strong>56 countries&lt;/strong> in the cross-section — close to the paper&amp;rsquo;s 915 / 207 / 56. The unbalanced shape is deliberate: rich OECD economies have long, dense coverage; developing countries have short, gappy series, exactly as in the real data.&lt;/p>
&lt;h2 id="3-measuring-spatial-inequality-the-wcv">3. Measuring spatial inequality: the WCV&lt;/h2>
&lt;p>Lessmann measures spatial inequality with the population-weighted coefficient of variation of regional GDP per capita:&lt;/p>
&lt;p>$$\mathrm{WCV}_{i,t} = \frac{1}{\bar{y}}\left[\sum_{j=1}^{n} p_j,(\bar{y} - y_j)^2\right]^{1/2}$$&lt;/p>
&lt;p>where $\bar{y}$ is the country&amp;rsquo;s average regional GDP per capita, $y_j$ is region $j$&amp;rsquo;s GDP per capita, $p_j$ is region $j$&amp;rsquo;s share of the country&amp;rsquo;s population, and $n$ is the number of regions. The population weighting is the crucial feature: a tiny, very rich (or very poor) region barely moves the index, while a populous region counts a lot.&lt;/p>
&lt;pre>&lt;code class="language-r">wcv_fun &amp;lt;- function(y, p) {
ybar &amp;lt;- sum(p * y) # population-weighted mean
sqrt(sum(p * (ybar - y)^2)) / ybar # weighted SD / mean
}
toy &amp;lt;- data.frame(gdp_pc = c(28000, 12000), pop_share = c(0.35, 0.65))
wcv_fun(toy$gdp_pc, toy$pop_share)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Worked WCV example: ybar = 17600, WCV = 0.434
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_01_wcv_explainer.png" alt="How the WCV is built from a two-region example">&lt;/p>
&lt;p>A rich capital region at \$28,000 (35% of the population) and a poorer hinterland at \$12,000 (65%) give a population-weighted mean of \$17,600 and a &lt;strong>WCV of 0.434&lt;/strong>. Because the larger, poorer region carries more weight, the index reflects how &lt;em>most people&lt;/em> experience the regional gap — not just the extremes. Mapping the same calculation across all 56 synthetic countries reproduces the familiar geography of spatial inequality.&lt;/p>
&lt;p>&lt;img src="r_kuznets_02_wcv_by_region.png" alt="Mean WCV by World Bank region">&lt;/p>
&lt;p>High-income North America and Europe show the lowest spatial inequality, while East Asia, Latin America and Sub-Saharan Africa show the highest — the cross-regional ranking Lessmann reports in Table 1.&lt;/p>
&lt;h2 id="4-spatial-vs-personal-inequality-fig-3">4. Spatial vs personal inequality (Fig 3)&lt;/h2>
&lt;p>Before modelling development, it is worth asking how spatial inequality relates to the more familiar &lt;strong>personal&lt;/strong> inequality (the household-income Gini). If they were the same thing, studying regions would add nothing.&lt;/p>
&lt;pre>&lt;code class="language-r">fig3_fit &amp;lt;- lm(gini ~ wcv, cs)
coef(fig3_fit); cor(cs$gini, cs$wcv)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">Fig 3: GINI = 0.311 + 0.208 * WCV (t = 2.45), corr = 0.316
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_03_gini_vs_wcv.png" alt="Spatial inequality predicts personal inequality">&lt;/p>
&lt;p>The slope is positive and significant (&lt;strong>0.208&lt;/strong>, t = 2.45) and the correlation is &lt;strong>0.316&lt;/strong> — close to the paper&amp;rsquo;s 0.324. Spatial inequality explains a real but partial share of personal inequality: the two are related, not interchangeable. A country can have high personal inequality with low regional inequality (the United States) or the reverse (a small, ethnically split economy). That partial overlap is exactly why the rest of the post focuses on the &lt;em>spatial&lt;/em> dimension in its own right.&lt;/p>
&lt;h2 id="5-cross-section-parametric-estimates-table-2">5. Cross-section parametric estimates (Table 2)&lt;/h2>
&lt;p>We start where Williamson (1965) did: a &lt;strong>cross-section&lt;/strong> of countries, using period means over 2000–2009. The estimating equation is a polynomial in development with controls:&lt;/p>
&lt;p>$$\mathrm{WCV}_{i} = \alpha + \sum_{j=1}^{k}\beta_j,Y_{i}^{,j} + \gamma X_{i} + \epsilon_{i}$$&lt;/p>
&lt;p>where $Y = \ln(\text{GDP per capita})$. An inverted-U needs $\beta_1 &amp;gt; 0$ and $\beta_2 &amp;lt; 0$. We use &lt;strong>White (HC1) heteroskedasticity-robust&lt;/strong> standard errors to match the paper.&lt;/p>
&lt;pre>&lt;code class="language-r">m1 &amp;lt;- lm(wcv ~ lnGDP, cs) # bivariate
m4 &amp;lt;- lm(wcv ~ lnGDP + I(lnGDP^2) + lnunits + lnarea + area_units +
ethnic + trade_gdp + urbanization + federal, cs) # full controls
m5 &amp;lt;- lm(wcv ~ lnGDP + I(lnGDP^2) + I(lnGDP^3) + lnunits + lnarea +
area_units + ethnic + trade_gdp + urbanization + federal, cs) # + cubic
lmtest::coeftest(m1, vcov = sandwich::vcovHC(m1, &amp;quot;HC1&amp;quot;))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">CS (1) lnGDP -0.098*** -&amp;gt; -0.092***
CS (4) lnGDP/^2 +0.33*/-0.021* -&amp;gt; 0.338* / -0.020**
CS (5) cubic 3.86**/-0.45**/0.017** -&amp;gt; 4.40***/-0.499***/0.0184***
CS adjR2 0.43/0.66/0.69 -&amp;gt; 0.33/0.67/0.73
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_table2_crosssection.png" alt="Regression table 2 — cross-section parametric estimates">&lt;/p>
&lt;p>The five specifications tell a story. The &lt;strong>bivariate&lt;/strong> slope is negative (&lt;strong>−0.092***&lt;/strong>): on average, richer countries have &lt;em>lower&lt;/em> spatial inequality — but a straight line hides the structure. Once the controls enter (column 4), the &lt;strong>inverted-U emerges&lt;/strong>: the linear income term turns positive (&lt;strong>+0.338*&lt;/strong>) and the squared term is negative (&lt;strong>−0.020**&lt;/strong>). Adding a cubic (column 5) makes all three income terms significant — &lt;strong>+4.40*** / −0.499*** / +0.0184***&lt;/strong> — and the positive cubic coefficient reveals an &lt;strong>upturn at very high income&lt;/strong> (the N-shape). Every control carries the expected sign: more trade and more regions raise spatial inequality, while federal constitutions and urbanisation lower it.&lt;/p>
&lt;p>&lt;img src="r_kuznets_04_crosssection_polys.png" alt="Cross-section scatter with linear, quadratic and cubic fits">&lt;/p>
&lt;p>The scatter makes the algebra visual: the straight line slopes down, the quadratic bends into an inverted-U, and the cubic adds the high-income upturn among the richest economies. &lt;strong>Interpretation:&lt;/strong> the same data support three different stories depending on the functional form — which is exactly why Lessmann reports all of them and then turns to semiparametric methods that do not force a shape.&lt;/p>
&lt;h2 id="6-panel-two-way-fixed-effects-table-3">6. Panel two-way fixed effects (Table 3)&lt;/h2>
&lt;h3 id="61-why-fixed-effects">6.1 Why fixed effects?&lt;/h3>
&lt;p>The cross-section compares &lt;em>different&lt;/em> countries, so any unmeasured, time-invariant trait correlated with income — geography, history, ethnic geography — can bias the estimate. A &lt;strong>panel&lt;/strong> lets us compare each country &lt;em>with itself over time&lt;/em> and absorb all such traits with country dummies.&lt;/p>
&lt;p>&lt;img src="r_kuznets_05_panel_spaghetti.png" alt="Within-country trajectories motivate fixed effects">&lt;/p>
&lt;p>Each grey line is one country&amp;rsquo;s path; the coloured lines highlight China, India, Russia, Brazil, the United States and Bolivia. Countries sit at very different inequality &lt;em>levels&lt;/em> for reasons unrelated to their income trajectory — and those level differences are precisely what country fixed effects remove.&lt;/p>
&lt;h3 id="62-the-fixestfeols-specification">6.2 The &lt;code>fixest::feols&lt;/code> specification&lt;/h3>
&lt;p>The panel model adds country &lt;em>and&lt;/em> year fixed effects:&lt;/p>
&lt;p>$$\mathrm{WCV}_{i,t} = \beta_1 Y_{i,t} + \beta_2 Y_{i,t}^2 + \gamma X_{i,t} + \alpha_i + \mu_t + \epsilon_{i,t}$$&lt;/p>
&lt;p>In &lt;code>fixest&lt;/code>, the fixed effects go after a vertical bar, and &lt;code>vcov = &amp;quot;hetero&amp;quot;&lt;/code> reproduces the paper&amp;rsquo;s White standard errors (clustering by country is the modern alternative):&lt;/p>
&lt;pre>&lt;code class="language-r">fa2 &amp;lt;- feols(wcv ~ lnGDP + I(lnGDP^2) + trade_gdp + urbanization |
country + year, data = annual, vcov = &amp;quot;hetero&amp;quot;)
fa3 &amp;lt;- feols(wcv ~ lnGDP + I(lnGDP^2) + I(lnGDP^3) + trade_gdp + urbanization |
country + year, data = annual, vcov = &amp;quot;hetero&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">PAN (2) lnGDP/^2 0.345**/-0.018** -&amp;gt; 0.394**/-0.0211** ; cubic n.s. -&amp;gt; -0.0008
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_table3_panel.png" alt="Regression table 3 — panel two-way fixed-effects estimates">&lt;/p>
&lt;p>The &lt;strong>within-country&lt;/strong> relationship is a clean inverted-U: the quadratic gives &lt;strong>+0.394**&lt;/strong> and &lt;strong>−0.0211**&lt;/strong>, matching the paper. Crucially, the &lt;strong>cubic term is insignificant&lt;/strong> (−0.0008, t = −0.26): there is &lt;em>no&lt;/em> high-income upturn within countries. This is the study&amp;rsquo;s central contrast — the upturn we saw in the cross-section is a &lt;em>between-country&lt;/em> phenomenon (rich service economies differ from rich manufacturing ones), not something a single country experiences as it grows.&lt;/p>
&lt;p>&lt;img src="r_kuznets_06_twfe_fit.png" alt="Within-country inverted-U from the TWFE model">&lt;/p>
&lt;p>The fitted TWFE quadratic peaks around ln(GDP) ≈ 9.8 (~\$18,000): as a typical country develops past that point, its regional gaps start to close. &lt;strong>Interpretation:&lt;/strong> fixed effects do not just tidy up standard errors — they change the substantive conclusion about whether the upturn is real for any given country.&lt;/p>
&lt;h3 id="63-annual-vs-5-year-averages">6.3 Annual vs 5-year averages&lt;/h3>
&lt;p>Annual data can be noisy because of business cycles, so Lessmann also estimates on 5-year averages. We build them by grouping years into six periods and averaging within country-period cells; the inverted-U survives (5-year quadratic ≈ +0.34 / −0.019), confirming the result is not a short-run artefact.&lt;/p>
&lt;h2 id="7-turning-points-and-the-discriminant-test">7. Turning points and the discriminant test&lt;/h2>
&lt;p>A cubic &lt;em>can&lt;/em> bend twice — but does it actually? And does it bend inside the range of incomes we observe? This section answers both. It is the most transferable skill in the post: any time you fit a cubic, these two checks tell you whether the curve really has the shape your coefficients seem to promise.&lt;/p>
&lt;h3 id="71-calculating-the-turning-points">7.1 Calculating the turning points&lt;/h3>
&lt;p>Where does the curve change direction? At a turning point the slope is zero, so we set the derivative of the cubic to zero:&lt;/p>
&lt;p>$$\frac{\partial \mathrm{WCV}}{\partial Y} = \beta_1 + 2\beta_2 Y + 3\beta_3 Y^2 = 0$$&lt;/p>
&lt;p>This is a &lt;em>quadratic&lt;/em> in $Y$, so it has at most two roots — the inverted-U peak and the high-income trough. One direct way to find them is &lt;code>polyroot&lt;/code>:&lt;/p>
&lt;pre>&lt;code class="language-r">bc &amp;lt;- coef(m5)
roots &amp;lt;- sort(Re(polyroot(c(bc[&amp;quot;lnGDP&amp;quot;], 2*bc[&amp;quot;I(lnGDP^2)&amp;quot;], 3*bc[&amp;quot;I(lnGDP^3)&amp;quot;]))))
data.frame(ln_gdp = roots, gdp_usd = round(exp(roots)))
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> ln_gdp gdp_usd
1 7.671 2146
2 10.356 31443
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_07_turning_points.png" alt="Turning points of the spatial Kuznets curve">&lt;/p>
&lt;p>Spatial inequality &lt;strong>rises&lt;/strong> with development up to ln(GDP) ≈ 7.7 (about &lt;strong>\$2,100&lt;/strong>), &lt;strong>falls&lt;/strong> until ln(GDP) ≈ 10.4 (about &lt;strong>\$31,000&lt;/strong>), and then &lt;strong>rises again&lt;/strong>. &lt;strong>Interpretation 1:&lt;/strong> the first threshold marks the industrial take-off where a few leading regions surge ahead; the second marks the maturity where convergence has run its course and post-industrial forces (tertiarisation) begin to pull rich regions apart again. Because the regressor is $\ln(\text{GDP})$, we exponentiate each root to read it back in dollars.&lt;/p>
&lt;h3 id="72-the-discriminant-does-the-curve-really-bend">7.2 The discriminant: does the curve really bend?&lt;/h3>
&lt;p>Computing the roots numerically works, but it hides &lt;em>why&lt;/em> a cubic sometimes has two turning points and sometimes none. The quadratic $\beta_1 + 2\beta_2 Y + 3\beta_3 Y^2 = 0$ has two real solutions exactly when its discriminant is positive. After dropping a harmless factor of 4 (see the algebra below), the rule simplifies to a single number:&lt;/p>
&lt;p>$$D ;\equiv; \beta_2^2 - 3,\beta_1\beta_3.$$&lt;/p>
&lt;p>There are three regimes:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Discriminant&lt;/th>
&lt;th>Real turning points&lt;/th>
&lt;th>Shape over the real line&lt;/th>
&lt;th>Verdict&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$D &amp;gt; 0$&lt;/td>
&lt;td>2&lt;/td>
&lt;td>rise–fall–rise (an &amp;ldquo;N on its side&amp;rdquo;)&lt;/td>
&lt;td>the cubic shape is &lt;strong>real&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$D = 0$&lt;/td>
&lt;td>1 (inflection)&lt;/td>
&lt;td>a single flat spot, no reversal&lt;/td>
&lt;td>knife-edge boundary&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$D &amp;lt; 0$&lt;/td>
&lt;td>0&lt;/td>
&lt;td>monotonic — never reverses&lt;/td>
&lt;td>the cubic shape is &lt;strong>not real&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The standard quadratic-formula discriminant is $b^2-4ac = (2\beta_2)^2 - 4(3\beta_3)(\beta_1) = 4(\beta_2^2 - 3\beta_1\beta_3) = 4D$; the factor of 4 never changes the sign, so we work with $D = \beta_2^2 - 3\beta_1\beta_3$. When $D&amp;gt;0$, the turning-point locations come straight from the quadratic formula (then exponentiate to dollars):&lt;/p>
&lt;p>$$Y^{\star} = \frac{-\beta_2 \pm \sqrt{D}}{3\beta_3}, \qquad \mathrm{GDP}^{\star} = \exp!\left(Y^{\star}\right).$$&lt;/p>
&lt;p>In R the whole test is two short functions:&lt;/p>
&lt;pre>&lt;code class="language-r">cubic_disc &amp;lt;- function(b1, b2, b3) b2^2 - 3 * b1 * b3 # the discriminant
cubic_tp &amp;lt;- function(b1, b2, b3) { # turning points (if any)
D &amp;lt;- cubic_disc(b1, b2, b3)
if (D &amp;lt;= 0) return(&amp;quot;no real turning points (monotonic)&amp;quot;)
sort(exp(c(-b2 - sqrt(D), -b2 + sqrt(D)) / (3 * b3))) # in GDP-per-capita units
}
bc &amp;lt;- coef(m5)
cubic_disc(bc[&amp;quot;lnGDP&amp;quot;], bc[&amp;quot;I(lnGDP^2)&amp;quot;], bc[&amp;quot;I(lnGDP^3)&amp;quot;])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">[1] 0.005519
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_14_discriminant_regimes.png" alt="The discriminant decides whether the cubic bends">&lt;/p>
&lt;p>&lt;strong>Interpretation 2:&lt;/strong> the figure holds the linear and cubic coefficients fixed and changes &lt;em>only&lt;/em> the squared term. A small change flips the regime: when $D&amp;lt;0$ the curve climbs monotonically, at $D=0$ it develops a single flat inflection, and once $D&amp;gt;0$ it bends into the genuine rise–fall–rise N-shape. The sign of one number — the discriminant — is what separates &amp;ldquo;a cubic that bends&amp;rdquo; from &amp;ldquo;a cubic that merely curves.&amp;rdquo;&lt;/p>
&lt;h3 id="73-two-checks-not-one-significance-is-not-shape">7.3 Two checks, not one: significance is not shape&lt;/h3>
&lt;p>Here is the trap. In our cross-section, &lt;em>all three&lt;/em> income terms are statistically significant (§5: $\beta_1=4.40^{***}$, $\beta_2=-0.499^{***}$, $\beta_3=0.018^{***}$). It is tempting to conclude &amp;ldquo;therefore the relationship is a genuine cubic with two turning points.&amp;rdquo; That inference is wrong as stated. Significance answers &lt;em>&amp;ldquo;does the data prefer keeping this term?&amp;rdquo;&lt;/em>; it does &lt;strong>not&lt;/strong> answer &lt;em>&amp;ldquo;does the fitted curve actually bend inside the income range we observe?&amp;rdquo;&lt;/em> The discriminant — plus a check on where the turning points fall — answers the second question. Applying both checks to this project&amp;rsquo;s two cubics, and to three illustrative cases, makes the distinction concrete:&lt;/p>
&lt;pre>&lt;code class="language-r"># applied to the cross-section cubic, the panel cubic, and three synthetic cases
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> case b1 b2 b3 D regime in_range
1 Cross-section cubic (significant) 4.3965 -0.4988 0.018447 +0.0055 2 turning points (both in range) TRUE
2 Panel cubic (insignificant) 0.1875 0.0017 -0.000836 +0.0005 2 turning points (&amp;gt;=1 OUT of range) FALSE
3 Synthetic 5a: genuine N-shape 4.4000 -0.5000 0.018000 +0.0124 2 turning points (both in range) TRUE
4 Synthetic 5b: monotonic trap 4.4000 -0.4000 0.018000 -0.0776 monotonic (D&amp;lt;0) FALSE
5 Synthetic 5c: turns out of range 4.4000 -0.5000 0.001000 +0.2368 2 turning points (&amp;gt;=1 OUT of range) FALSE
&lt;/code>&lt;/pre>
&lt;p>Read the rows from top to bottom:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Cross-section cubic&lt;/strong> — $D = +0.0055 &amp;gt; 0$ and &lt;em>both&lt;/em> turning points (\$2,146 and \$31,443) fall inside the observed income range (\$315–\$82,653). This is a genuine N-shape. Significance and shape agree.&lt;/li>
&lt;li>&lt;strong>Panel cubic&lt;/strong> — the within-country cubic term was &lt;em>insignificant&lt;/em> (§6, $t=-0.26$), so it fails the first check already. Even taking its coefficients at face value, $D&amp;gt;0$ but one implied turning point sits at roughly &lt;strong>\$0.0003&lt;/strong> — absurdly far below any real economy — so the curve does not bend inside the observed range. Two independent reasons to reject a within-country N-shape, exactly matching §6&amp;rsquo;s clean inverted-U.&lt;/li>
&lt;li>&lt;strong>Synthetic 5b&lt;/strong> (the trap) — &lt;em>same sign pattern&lt;/em> as the genuine case, only $\beta_2$ is a touch smaller in magnitude, and $D = -0.078 &amp;lt; 0$. The curve is monotonic everywhere. A cubic regression on such data could report all three terms as &amp;ldquo;significant&amp;rdquo; and still have no turning point at all.&lt;/li>
&lt;li>&lt;strong>Synthetic 5c&lt;/strong> — $D&amp;gt;0$, so two turning points exist &lt;em>mathematically&lt;/em>, but they land at \$86 and an astronomically high income. Inside any realistic range the curve is monotonic. &amp;ldquo;Two turning points exist&amp;rdquo; would be technically true and practically misleading.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Interpretation 3:&lt;/strong> significance (does the data want the term?) and the discriminant-plus-range check (does the curve actually bend, and where?) are different questions, and you need both. Reporting &amp;ldquo;all three GDP terms are significant, so the curve is cubic&amp;rdquo; can be wrong in two distinct ways — the discriminant can be negative (5b), or the turning points can fall outside the data (5c). The honest workflow is: report the coefficients, compute $D$, and &lt;em>if&lt;/em> $D&amp;gt;0$ confirm the turning points lie inside the observed income range before claiming an inverted-U or N-shape.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Aside (for Bayesian model averaging).&lt;/strong> The same trap appears with a different label. In a BMA, a term&amp;rsquo;s posterior inclusion probability (PIP) near 1.00 is the Bayesian analogue of &amp;ldquo;statistically significant&amp;rdquo; — it says the data prefer keeping the term. But a high PIP on the cubic term no more guarantees a genuine bend than a significant cubic coefficient does: you still compute $D = \beta_2^2 - 3\beta_1\beta_3$ from the &lt;em>posterior-mean&lt;/em> coefficients and check the turning-point range. The companion note &lt;em>Turning Points and Discriminant Analysis&lt;/em> (Mendez, 2026) works through real cases — cross-country CO₂ ($D&amp;gt;0$, genuine) versus Chinese provincial PM₂.₅ (PIPs ≈ 1.00 but $D&amp;lt;0$, monotonic) — that make the point with field data.&lt;/p>
&lt;/blockquote>
&lt;h2 id="8-semiparametric-cross-section-the-robinson-estimator-table-4-fig-4">8. Semiparametric cross-section: the Robinson estimator (Table 4, Fig 4)&lt;/h2>
&lt;p>A polynomial &lt;em>forces&lt;/em> a shape. A &lt;strong>partially-linear model&lt;/strong> lets the income effect be any smooth curve while keeping the controls linear:&lt;/p>
&lt;p>$$\mathrm{WCV} = \alpha + f(Y) + \gamma X + \epsilon$$&lt;/p>
&lt;p>Robinson&amp;rsquo;s (1988) estimator is a clever two-step &amp;ldquo;double residual&amp;rdquo; idea: first partial $Y$ out of both the outcome and each control &lt;em>non-parametrically&lt;/em>, then run OLS on the residuals to recover $\gamma$; finally smooth the leftover against $Y$ to draw $f$.&lt;/p>
&lt;pre>&lt;code class="language-r"># Step 1: non-parametrically remove lnGDP from y and each control
resid_np &amp;lt;- function(v, z) residuals(npreg(v ~ z, regtype = &amp;quot;ll&amp;quot;, ckertype = &amp;quot;gaussian&amp;quot;))
ey &amp;lt;- resid_np(cs$wcv, cs$lnGDP)
eX &amp;lt;- sapply(Xnames, function(nm) resid_np(cs[[nm]], cs$lnGDP))
# Step 2: OLS of residualised y on residualised X -&amp;gt; linear part (Table 4)
rob &amp;lt;- lm(ey ~ eX - 1)
# np::npplreg implements exactly this estimator and returns identical coefficients
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> robinson_coef t npplreg_coef
lnunits 0.1650 3.9405 0.1575
trade_gdp 0.0021 4.0348 0.0020
urbanization -0.0057 -3.0047 -0.0056
federal -0.0670 -1.7456 -0.0525
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_08_robinson_partial.png" alt="Robinson semiparametric partial fit with 90% band">&lt;/p>
&lt;p>The linear-part coefficients match the parametric estimates — more regions and more trade raise inequality, urbanisation and federalism lower it — and &lt;code>np::npplreg&lt;/code> returns the &lt;em>same&lt;/em> numbers, confirming the hand-built estimator. The flexible curve $f(Y)$ traces the inverted-U with a high-income upturn, and the 90% band widens at the sparse low-income end. &lt;strong>Interpretation:&lt;/strong> because the curve was never told to be a cubic, its agreement with the parametric cubic is independent evidence that the N-shape is in the data, not an artefact of the polynomial.&lt;/p>
&lt;h2 id="9-semiparametric-panel-the-baltagili-series-estimator-table-5-fig-5">9. Semiparametric panel: the Baltagi–Li series estimator (Table 5, Fig 5)&lt;/h2>
&lt;p>For the panel, Baltagi &amp;amp; Li (2002) remove the fixed effects and approximate $f(Y)$ with a &lt;strong>cubic B-spline&lt;/strong> (order $k = 4$). We implement this faithfully in &lt;code>fixest&lt;/code>: a B-spline basis of the income term, with country and year fixed effects absorbed.&lt;/p>
&lt;pre>&lt;code class="language-r">B &amp;lt;- splines::bs(annual$lnGDP, degree = 3, df = 5) # cubic B-spline (order k=4)
colnames(B) &amp;lt;- paste0(&amp;quot;bs&amp;quot;, 1:5)
m_bl &amp;lt;- feols(wcv ~ bs1+bs2+bs3+bs4+bs5 + trade_gdp + urbanization |
country + year, data = cbind(annual, B), vcov = &amp;quot;hetero&amp;quot;)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text"> estimate t within_r2
trade_gdp 0.0002 0.564 0.021 (annual)
urbanization -0.0027 -2.785 0.021 (annual)
urbanization -0.0029 -2.634 0.068 (5-year)
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_09_baltagili_annual.png" alt="Baltagi–Li semiparametric fit, annual panel">
&lt;img src="r_kuznets_10_baltagili_5yr.png" alt="Baltagi–Li semiparametric fit, 5-year averages">&lt;/p>
&lt;p>Trade is insignificant and &lt;strong>urbanisation is significantly negative&lt;/strong> (−0.0027** annual, −0.0029** on 5-year averages), matching the paper&amp;rsquo;s Table 5. The recovered $f(Y)$ curves show the within-country inverted-U with &lt;strong>no upturn&lt;/strong> at high income — the same message as the parametric panel, now without assuming a polynomial. &lt;strong>Interpretation:&lt;/strong> two very different flexible methods (kernel-based Robinson and spline-based Baltagi–Li) agree with the parametric models, which is exactly the kind of triangulation that makes a descriptive finding credible.&lt;/p>
&lt;h2 id="10-the-sectoral-channel-table-6">10. The sectoral channel (Table 6)&lt;/h2>
&lt;p>Kuznets and Williamson argued that the &lt;em>real&lt;/em> driver is &lt;strong>structural change&lt;/strong> — the shift from agriculture to industry and services — with income just a proxy. We test this directly by replacing income with the &lt;strong>non-agricultural share of gross value added&lt;/strong>.&lt;/p>
&lt;pre>&lt;code class="language-r">s4 &amp;lt;- lm(wcv ~ nonag + I(nonag^2) + lnunits + lnarea + area_units +
ethnic + trade_gdp + urbanization + federal, cs)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-text">== Table 6: sectoral data (non-agricultural GVA / GDP) ==
nonag = 0.0165*** nonag^2 = -0.00014***
&lt;/code>&lt;/pre>
&lt;p>&lt;img src="r_kuznets_11_sectoral.png" alt="The sectoral channel behind the Kuznets curve">&lt;/p>
&lt;p>Spatial inequality &lt;strong>rises then falls&lt;/strong> with the non-agricultural share (&lt;strong>+0.0165***&lt;/strong> and &lt;strong>−0.00014***&lt;/strong>) — an inverted-U in the structural variable itself. &lt;strong>Interpretation:&lt;/strong> this is the mechanism behind the income result. As an economy industrialises, a few regions capture the new activity and gaps widen; as the modern sector matures and spreads, gaps narrow. Development raises inequality &lt;em>because&lt;/em> it reshuffles where output is produced.&lt;/p>
&lt;h2 id="11-robustness">11. Robustness&lt;/h2>
&lt;h3 id="111-excluding-the-poorest-countries">11.1 Excluding the poorest countries&lt;/h3>
&lt;p>The rising arm of the curve depends on poor countries. Dropping those with GDP per capita below \$1,000 weakens the full inverted-U — the cubic no longer traces the complete shape, just as the paper finds.&lt;/p>
&lt;p>&lt;img src="r_kuznets_13_exclude_poorest.png" alt="Robustness: excluding the poorest countries">&lt;/p>
&lt;h3 id="112-excluding-capital-regions">11.2 Excluding capital regions&lt;/h3>
&lt;p>Capital regions are often far richer than the rest of a country. Recomputing the WCV without them and correlating with the original gives &lt;strong>0.84&lt;/strong> (paper 0.81) — capitals matter in individual cases but do not overturn the cross-country picture.&lt;/p>
&lt;h3 id="113-alternative-inequality-measures">11.3 Alternative inequality measures&lt;/h3>
&lt;p>Swapping the population-weighted WCV for the unweighted coefficient of variation or a regional Gini leaves the cubic in place — the inverted-U is not an artefact of the particular index.&lt;/p>
&lt;h3 id="114-income-in-logs-vs-levels-fig-7">11.4 Income in logs vs levels (Fig 7)&lt;/h3>
&lt;p>&lt;img src="r_kuznets_12_log_vs_level.png" alt="Why the high-income upturn is fragile: logs vs levels">&lt;/p>
&lt;p>This is the most important caveat. With income in &lt;strong>logs&lt;/strong> there is no high-income upturn; with income in &lt;strong>levels&lt;/strong> a slight upturn reappears. &lt;strong>Interpretation:&lt;/strong> the existence of the upturn is partly a measurement choice. The robust finding is the inverted-U; the N-shape is real but fragile — which is why Lessmann hedges it and why we should too.&lt;/p>
&lt;h2 id="12-summary-statistics-table-a3">12. Summary statistics (Table A.3)&lt;/h2>
&lt;p>&lt;img src="r_kuznets_tableA3_summary.png" alt="Summary statistics for the synthetic cross-section">&lt;/p>
&lt;p>The synthetic variables match the paper&amp;rsquo;s Table A.3 within about ±10% on every dimension: WCV mean 0.36 (paper 0.35), ln(units) mean 2.39, ln(area) mean 12.69, ethnic fractionalisation 0.31, Trade/GDP 82, urbanisation 69, federal share 0.21. Anchoring the marginal distributions to the paper is what makes the regression coefficients land in the right place.&lt;/p>
&lt;h2 id="13-discussion">13. Discussion&lt;/h2>
&lt;p>So, &lt;strong>is there an inverted-U?&lt;/strong> On this synthetic data, calibrated to the paper, the answer is a clear &lt;em>yes&lt;/em> — with a nuance. Between countries, the relationship is N-shaped: spatial inequality rises until about \$2,100 of GDP per capita, falls until about \$31,000, then edges up again. Within countries, the relationship is a &lt;em>clean&lt;/em> inverted-U with no upturn. The two pictures are reconciled by recognising that the high-income upturn is a &lt;em>cross-sectional&lt;/em> feature — rich service economies are simply more spatially unequal than rich manufacturing ones — rather than something a developing country marches through.&lt;/p>
&lt;p>What does this mean for policy? Wide regional gaps are, to a first approximation, &lt;strong>transitional&lt;/strong>: they tend to widen during industrial take-off and narrow as economies mature. That is cautiously good news, but the transition can take decades and the gaps can be politically dangerous while they last. The sectoral result points to the lever: because structural change drives the curve, investing in the human capital and connectivity of lagging regions can shorten the painful middle stretch.&lt;/p>
&lt;h2 id="14-summary-and-next-steps">14. Summary and next steps&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>The inverted-U is robust.&lt;/strong> Across parametric OLS, two-way fixed effects, and two semiparametric estimators, spatial inequality rises then falls with development.&lt;/li>
&lt;li>&lt;strong>The high-income upturn is fragile.&lt;/strong> It appears between countries and in levels, but vanishes within countries and under the log transform.&lt;/li>
&lt;li>&lt;strong>Fixed effects change the conclusion&lt;/strong>, not just the standard errors — the upturn is between-country, not within-country.&lt;/li>
&lt;li>&lt;strong>Structural change is the mechanism&lt;/strong>: the non-agricultural share reproduces the same curve.&lt;/li>
&lt;li>&lt;strong>Next steps.&lt;/strong> Re-run the simulation with a different seed to see sampling variability; cluster the panel standard errors by country; or extend the data window and test whether the second turning point moves.&lt;/li>
&lt;/ul>
&lt;h2 id="15-exercises">15. Exercises&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Re-seed the world.&lt;/strong> Change &lt;code>set.seed(123)&lt;/code> to another value and re-run. Which coefficients are stable and which bounce around? What does that tell you about the fragility of the cubic?&lt;/li>
&lt;li>&lt;strong>Cluster the standard errors.&lt;/strong> Re-estimate the panel with &lt;code>vcov = ~country&lt;/code> instead of &lt;code>&amp;quot;hetero&amp;quot;&lt;/code>. Do the quadratic terms stay significant? Why might clustering matter here?&lt;/li>
&lt;li>&lt;strong>Swap the measure.&lt;/strong> Replace &lt;code>wcv&lt;/code> with the regional Gini (&lt;code>gini_reg&lt;/code>) in the cross-section cubic. Does the inverted-U survive? What does that say about measurement robustness?&lt;/li>
&lt;li>&lt;strong>Apply the discriminant.&lt;/strong> A colleague fits a cubic and reports $\beta_1 = 4.4$, $\beta_2 = -0.40$, $\beta_3 = 0.018$, all significant. Compute $D = \beta_2^2 - 3\beta_1\beta_3$ by hand. Does the curve have two turning points? (Compare your answer with synthetic case 5b in §7.3.) Then halve $\beta_3$ and recompute — does the verdict change, and would you trust two turning points that fall at \$80 and \$10^{40}?&lt;/li>
&lt;/ol>
&lt;h2 id="16-references">16. References&lt;/h2>
&lt;ul>
&lt;li>Lessmann, C. (2014). Spatial inequality and development — Is there an inverted-U relationship? &lt;em>Journal of Development Economics&lt;/em>, 106, 35–51.&lt;/li>
&lt;li>Kuznets, S. (1955). Economic growth and income inequality. &lt;em>American Economic Review&lt;/em>, 45(1), 1–28.&lt;/li>
&lt;li>Williamson, J. G. (1965). Regional inequality and the process of national development: A description of the patterns. &lt;em>Economic Development and Cultural Change&lt;/em>, 13(4), 1–84.&lt;/li>
&lt;li>Robinson, P. M. (1988). Root-N-consistent semiparametric regression. &lt;em>Econometrica&lt;/em>, 56(4), 931–954.&lt;/li>
&lt;li>Baltagi, B. H., &amp;amp; Li, D. (2002). Series estimation of partially linear panel data models with fixed effects. &lt;em>Annals of Economics and Finance&lt;/em>, 3, 103–116.&lt;/li>
&lt;li>Mendez, C. (2026). &lt;em>Turning Points and Discriminant Analysis&lt;/em> — a note on why high posterior inclusion probabilities (or statistical significance) do not guarantee a genuine cubic shape.&lt;/li>
&lt;li>Gravina, A. F., &amp;amp; Lanzafame, M. (2025). &amp;ldquo;What&amp;rsquo;s your shape?&amp;rdquo; A data-driven approach to estimating the Environmental Kuznets Curve. &lt;em>Energy Economics&lt;/em>, 148.&lt;/li>
&lt;li>Eicher, T. S., Papageorgiou, C., &amp;amp; Raftery, A. E. (2011). Default priors and predictive performance in Bayesian model averaging, with application to growth determinants. &lt;em>Journal of Applied Econometrics&lt;/em>, 26(1), 30–55.&lt;/li>
&lt;li>Bergé, L. (2018). Efficient estimation of maximum likelihood models with multiple fixed effects: the R package &lt;code>FENmlm&lt;/code>. &lt;em>CREA Discussion Papers&lt;/em>, 13.&lt;/li>
&lt;li>Hayfield, T., &amp;amp; Racine, J. S. (2008). Nonparametric econometrics: the &lt;code>np&lt;/code> package. &lt;em>Journal of Statistical Software&lt;/em>, 27(5).&lt;/li>
&lt;/ul>
&lt;hr>
&lt;style>
.podcast-overlay {
display: none;
position: fixed;
bottom: 0;
left: 0;
right: 0;
z-index: 9999;
animation: podSlideUp 0.35s ease-out;
}
@keyframes podSlideUp {
from { transform: translateY(100%); }
to { transform: translateY(0); }
}
.podcast-overlay.pod-closing {
animation: podSlideDown 0.3s ease-in forwards;
}
@keyframes podSlideDown {
from { transform: translateY(0); }
to { transform: translateY(100%); }
}
.podcast-container {
background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
padding: 18px 24px 20px;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
box-shadow: 0 -4px 32px rgba(0,0,0,0.5);
border-top: 1px solid rgba(106,155,204,0.2);
}
.podcast-inner {
max-width: 800px;
margin: 0 auto;
}
.podcast-top-row {
display: flex;
align-items: center;
gap: 14px;
margin-bottom: 14px;
}
.podcast-icon {
width: 42px;
height: 42px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.podcast-icon svg {
width: 22px;
height: 22px;
fill: #fff;
}
.podcast-title-block {
flex: 1;
min-width: 0;
}
.podcast-title-block h4 {
margin: 0 0 1px 0;
color: #f0ece2;
font-size: 14px;
font-weight: 600;
letter-spacing: 0.02em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.podcast-title-block span {
color: #8b9dc3;
font-size: 11px;
}
.podcast-close-btn {
background: none;
border: none;
cursor: pointer;
padding: 6px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
transition: background 0.2s;
flex-shrink: 0;
}
.podcast-close-btn:hover {
background: rgba(255,255,255,0.1);
}
.podcast-close-btn svg {
width: 20px;
height: 20px;
fill: #8b9dc3;
}
.podcast-progress-wrap {
margin-bottom: 12px;
}
.podcast-time-row {
display: flex;
justify-content: space-between;
font-size: 11px;
color: #8b9dc3;
margin-bottom: 5px;
font-variant-numeric: tabular-nums;
}
.podcast-bar-bg {
width: 100%;
height: 6px;
background: rgba(255,255,255,0.1);
border-radius: 3px;
cursor: pointer;
position: relative;
overflow: hidden;
transition: height 0.15s;
}
.podcast-bar-buffered {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: rgba(106,155,204,0.25);
border-radius: 3px;
transition: width 0.3s;
}
.podcast-bar-progress {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: linear-gradient(90deg, #6a9bcc, #00d4c8);
border-radius: 3px;
transition: width 0.1s linear;
}
.podcast-bar-bg:hover {
height: 10px;
margin-top: -2px;
}
.podcast-controls-row {
display: flex;
align-items: center;
justify-content: space-between;
}
.podcast-transport {
display: flex;
align-items: center;
gap: 8px;
}
.podcast-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
transition: all 0.2s;
}
.podcast-btn svg {
fill: #c8d0e0;
transition: fill 0.2s;
}
.podcast-btn:hover svg {
fill: #f0ece2;
}
.podcast-btn-skip {
position: relative;
}
.podcast-btn-skip span {
position: absolute;
font-size: 7px;
font-weight: 700;
color: #c8d0e0;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
pointer-events: none;
margin-top: 1px;
}
.podcast-btn-play {
width: 48px;
height: 48px;
background: linear-gradient(135deg, #d97757, #e8956a);
border-radius: 50%;
box-shadow: 0 3px 12px rgba(217,119,87,0.4);
transition: all 0.2s;
}
.podcast-btn-play:hover {
transform: scale(1.08);
box-shadow: 0 5px 20px rgba(217,119,87,0.5);
}
.podcast-btn-play svg {
fill: #fff;
width: 22px;
height: 22px;
}
.podcast-extras {
display: flex;
align-items: center;
gap: 10px;
}
.podcast-volume-wrap {
display: flex;
align-items: center;
gap: 5px;
}
.podcast-volume-wrap svg {
fill: #8b9dc3;
width: 16px;
height: 16px;
cursor: pointer;
flex-shrink: 0;
}
.podcast-volume-wrap svg:hover {
fill: #c8d0e0;
}
.podcast-volume-slider {
-webkit-appearance: none;
appearance: none;
width: 60px;
height: 4px;
background: rgba(255,255,255,0.12);
border-radius: 2px;
outline: none;
cursor: pointer;
}
.podcast-volume-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 12px;
height: 12px;
background: #6a9bcc;
border-radius: 50%;
cursor: pointer;
}
.podcast-speed-btn {
background: rgba(255,255,255,0.08);
border: 1px solid rgba(255,255,255,0.12);
color: #c8d0e0;
font-size: 11px;
font-weight: 600;
padding: 3px 9px;
border-radius: 12px;
cursor: pointer;
transition: all 0.2s;
font-family: inherit;
min-width: 40px;
text-align: center;
}
.podcast-speed-btn:hover {
background: rgba(106,155,204,0.2);
border-color: #6a9bcc;
color: #f0ece2;
}
.podcast-download-btn {
background: none;
border: 1px solid rgba(255,255,255,0.12);
border-radius: 8px;
padding: 4px 10px;
cursor: pointer;
display: flex;
align-items: center;
gap: 4px;
color: #8b9dc3;
font-size: 11px;
font-family: inherit;
text-decoration: none;
transition: all 0.2s;
}
.podcast-download-btn:hover {
border-color: #6a9bcc;
color: #f0ece2;
background: rgba(106,155,204,0.1);
}
.podcast-download-btn svg {
width: 14px;
height: 14px;
fill: currentColor;
}
@media (max-width: 600px) {
.podcast-container { padding: 14px 16px 16px; }
.podcast-volume-wrap { display: none; }
.podcast-title-block h4 { font-size: 13px; }
.podcast-extras { gap: 8px; }
}
&lt;/style>
&lt;div class="podcast-overlay" id="podOverlay">
&lt;div class="podcast-container">
&lt;div class="podcast-inner">
&lt;audio id="podAudio" preload="none" src="https://files.catbox.moe/4q0wgx.m4a">&lt;/audio>
&lt;div class="podcast-top-row">
&lt;div class="podcast-icon">
&lt;svg viewBox="0 0 24 24">&lt;path d="M12 1a5 5 0 0 0-5 5v4a5 5 0 0 0 10 0V6a5 5 0 0 0-5-5zm0 16a7 7 0 0 1-7-7H3a9 9 0 0 0 8 8.94V22h2v-3.06A9 9 0 0 0 21 10h-2a7 7 0 0 1-7 7z"/>&lt;/svg>
&lt;/div>
&lt;div class="podcast-title-block">
&lt;h4>AI Podcast: Spatial Inequality and the Kuznets Curve&lt;/h4>
&lt;span id="podDurationLabel">Click play to load&lt;/span>
&lt;/div>
&lt;button class="podcast-close-btn" onclick="podClose()" title="Close player">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/>&lt;/svg>
&lt;/button>
&lt;/div>
&lt;div class="podcast-progress-wrap">
&lt;div class="podcast-time-row">
&lt;span id="podCurrent">0:00&lt;/span>
&lt;span id="podDuration">0:00&lt;/span>
&lt;/div>
&lt;div class="podcast-bar-bg" id="podBarBg" onclick="podSeek(event)">
&lt;div class="podcast-bar-buffered" id="podBuffered">&lt;/div>
&lt;div class="podcast-bar-progress" id="podProgress">&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class="podcast-controls-row">
&lt;div class="podcast-transport">
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(-15)" title="Back 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1L7 6l5 5V7c3.31 0 6 2.69 6 6s-2.69 6-6 6-6-2.69-6-6H4c0 4.42 3.58 8 8 8s8-3.58 8-8-3.58-8-8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-play" id="podPlayBtn" onclick="podToggle()" title="Play">
&lt;svg id="podIconPlay" viewBox="0 0 24 24">&lt;path d="M8 5v14l11-7z"/>&lt;/svg>
&lt;svg id="podIconPause" viewBox="0 0 24 24" style="display:none">&lt;path d="M6 19h4V5H6v14zm8-14v14h4V5h-4z"/>&lt;/svg>
&lt;/button>
&lt;button class="podcast-btn podcast-btn-skip" onclick="podSkip(15)" title="Forward 15s">
&lt;svg width="26" height="26" viewBox="0 0 24 24">&lt;path d="M12 5V1l5 5-5 5V7c-3.31 0-6 2.69-6 6s2.69 6 6 6 6-2.69 6-6h2c0 4.42-3.58 8-8 8s-8-3.58-8-8 3.58-8 8-8z"/>&lt;/svg>
&lt;span>15&lt;/span>
&lt;/button>
&lt;/div>
&lt;div class="podcast-extras">
&lt;div class="podcast-volume-wrap">
&lt;svg id="podVolIcon" onclick="podMute()" viewBox="0 0 24 24">&lt;path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3A4.5 4.5 0 0 0 14 8.5v7a4.47 4.47 0 0 0 2.5-3.5zM14 3.23v2.06a6.51 6.51 0 0 1 0 13.42v2.06A8.51 8.51 0 0 0 14 3.23z"/>&lt;/svg>
&lt;input type="range" class="podcast-volume-slider" id="podVolume" min="0" max="1" step="0.05" value="0.8">
&lt;/div>
&lt;button class="podcast-speed-btn" id="podSpeedBtn" onclick="podCycleSpeed()" title="Playback speed">1x&lt;/button>
&lt;a class="podcast-download-btn" href="https://files.catbox.moe/4q0wgx.m4a" target="_blank" rel="noopener" title="Stream">
&lt;svg viewBox="0 0 24 24">&lt;path d="M19 9h-4V3H9v6H5l7 7 7-7zM5 18v2h14v-2H5z"/>&lt;/svg>
&lt;/a>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;/div>
&lt;script>
(function(){
var overlay = document.getElementById('podOverlay');
var a = document.getElementById('podAudio');
var speeds = [0.75, 1, 1.25, 1.5, 2];
var si = 1;
var opened = false;
function fmt(s){
if(isNaN(s)) return '0:00';
var m=Math.floor(s/60), sec=Math.floor(s%60);
return m+':'+(sec&lt;10?'0':'')+sec;
}
document.addEventListener('click', function(e){
var link = e.target.closest('a.btn-page-header');
if(!link) return;
var text = link.textContent.trim();
if(text.indexOf('AI Podcast') === -1) return;
e.preventDefault();
e.stopPropagation();
overlay.style.display = 'block';
overlay.classList.remove('pod-closing');
if(!opened){
a.preload = 'metadata';
a.load();
opened = true;
}
});
a.volume = 0.8;
a.addEventListener('loadedmetadata', function(){
document.getElementById('podDuration').textContent = fmt(a.duration);
document.getElementById('podDurationLabel').textContent = fmt(a.duration) + ' minutes';
});
a.addEventListener('timeupdate', function(){
document.getElementById('podCurrent').textContent = fmt(a.currentTime);
var pct = a.duration ? (a.currentTime/a.duration)*100 : 0;
document.getElementById('podProgress').style.width = pct+'%';
});
a.addEventListener('progress', function(){
if(a.buffered.length>0){
var pct = (a.buffered.end(a.buffered.length-1)/a.duration)*100;
document.getElementById('podBuffered').style.width = pct+'%';
}
});
a.addEventListener('ended', function(){
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
});
window.podToggle = function(){
if(a.paused){a.play();document.getElementById('podIconPlay').style.display='none';document.getElementById('podIconPause').style.display='';}
else{a.pause();document.getElementById('podIconPlay').style.display='';document.getElementById('podIconPause').style.display='none';}
};
window.podSkip = function(s){a.currentTime = Math.max(0,Math.min(a.duration||0,a.currentTime+s));};
window.podSeek = function(e){
var rect = document.getElementById('podBarBg').getBoundingClientRect();
var pct = (e.clientX - rect.left)/rect.width;
a.currentTime = pct * (a.duration||0);
};
window.podMute = function(){
a.muted = !a.muted;
document.getElementById('podVolume').value = a.muted ? 0 : a.volume;
};
window.podCycleSpeed = function(){
si = (si+1) % speeds.length;
a.playbackRate = speeds[si];
document.getElementById('podSpeedBtn').textContent = speeds[si]+'x';
};
window.podClose = function(){
overlay.classList.add('pod-closing');
setTimeout(function(){ overlay.style.display='none'; }, 300);
a.pause();
document.getElementById('podIconPlay').style.display='';
document.getElementById('podIconPause').style.display='none';
};
document.getElementById('podVolume').addEventListener('input', function(){
a.volume = this.value;
a.muted = false;
});
if(window.location.hash === '#podcast-player'){
overlay.style.display = 'block';
a.preload = 'metadata';
a.load();
opened = true;
}
})();
&lt;/script></description></item></channel></rss>