Classifier-LASSO in Stata — when the pooled average is a lie
Nagoya University (GSID)
June 11, 2026
Act I
Acemoglu, Naidu, Restrepo & Robinson (2019) found democracy causes growth — one pooled coefficient for all 98 countries.
But a single slope forces Japan and Nigeria to react identically. What if the average is hiding two opposite stories?
Savings-to-GDP ratio across 56 countries (1995–2010); each line is one country, revealing wide dispersion in dynamics.
Act II
C-LASSO sits in the middle: \(K\) latent groups, slopes shared within a group, free across groups.
\[Q_{NT,\lambda}^{(K)} = \frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\big(y_{it}-\boldsymbol{\beta}_i'\mathbf{x}_{it}\big)^2 + \frac{\lambda_{NT}}{N}\sum_{i=1}^{N}\prod_{k=1}^{K}\|\boldsymbol{\beta}_i-\boldsymbol{\alpha}_k\|\]
The product penalty \(\prod_k\|\boldsymbol{\beta}_i-\boldsymbol{\alpha}_k\|\) vanishes when \(\boldsymbol{\beta}_i\) is near any center \(\boldsymbol{\alpha}_k\) — that’s what sorts unit \(i\) into a group.
| Variable | Pooled / FE coef. | Significant? |
|---|---|---|
| lagsavings | 0.605 | yes |
| gdp | 0.188 | yes |
| cpi | +0.030 | no |
| interest | +0.006 | no |
The textbook read: “inflation and interest rates don’t affect savings.” Hold that thought.
IC (left axis) and iterations to convergence (right axis) by number of groups, static savings model — minimized at K=2.
| Group | cpi | interest | gdp |
|---|---|---|---|
| Group 1 (34 countries) | −0.181 | −0.197 | +0.335 |
| Group 2 (22 countries) | +0.478 | +0.263 | +0.112 |
All four highlighted CPI/interest coefficients are significant at \(p < 0.001\). The pooled +0.030 was \(-0.181\) and \(+0.478\) canceling.
CPI coefficient and 95% bands by group, dynamic savings model: Group 1 negative, Group 2 positive, non-overlapping bands.
Interest-rate coefficient and 95% bands by group, dynamic savings model: the same negative-vs-positive split as CPI.
Act III
| Term | Coefficient | Clustered SE | p |
|---|---|---|---|
| Democracy | +1.055 | 0.370 | 0.005 |
| lagged GDP | +0.970 | 0.006 | <0.001 |
Standard errors clustered on 98 countries. This replicates Acemoglu et al. (2019): on average, democracy promotes growth.
IC and iteration count, democracy model — minimized at K=2, but IC values span just 0.013 across all K.
+2.151
Democracy effect on log GDP, Group 1 of 57 countries (p < 0.001); Group 2’s 41 countries get −0.936 (p = 0.007)
Democracy coefficient by country and group: Group 1 (57) clusters near +2.2, Group 2 (41) near −1.0; pooled +1.05 fits neither.
| Pooled FE | Group 1 | Group 2 | |
|---|---|---|---|
| Democracy coef. | +1.055 | +2.151 | −0.936 |
| Clustered SE | 0.370 | 0.546 | 0.348 |
| p-value | 0.005 | <0.001 | 0.007 |
| Countries | 98 | 57 | 41 |
This is Simpson’s paradox in a panel: the aggregate trend reverses inside the subgroups.
Objection. You sorted countries by their coefficients, so of course the groups have different coefficients — and a sign flip sounds like overfitting.
Response. The sorting is penalized and validated by an out-of-sample-style IC that consistently picks K = 2 across three specifications; the postlasso re-fit gives honest, country-clustered SEs with non-overlapping bands. C-LASSO chooses controls/groups flexibly — it does not relax the identifying assumptions of Acemoglu et al. (2019). These remain conditional associations.