How persistent is firm employment? Two wrong answers that bracket the truth
If this year's employment depends on last year's, the model needs yesterday's outcome on the right-hand side — and that one change breaks both workhorse panel estimators in opposite directions. Pooled OLS loads the omitted firm effect onto the lag and lands above the true persistence ρ; the within (fixed-effects) estimator picks up Nickell bias and lands below it. Two wrong answers with known signs are a measuring stick: the truth must sit inside the bracket. In the post's real data the bracket is [0.626, 0.962]. Set the true ρ yourself, then drag T and N and watch the bracket form around it.
Why the bracket tightens as T grows (animation)
Average ρ̂ across simulated panels as the panel length T grows, holding true ρ = 0.80. The grey curve is pooled OLS: biased up by the firm effect αi, and longer panels do not save it. The orange curve is fixed effects: biased down by order −1/T (Nickell), so it climbs toward the truth as T grows. The bracket between them is widest exactly where the post's data lives — T ≈ 7–9 years per firm.
Simulate your own panel
What to look for
- Drag T down to 3 or 4. The orange FE bar sinks far below the steel true-ρ line — at the post's T ≈ 7–9 the simulated FE bias is the same order as the real one (0.626 against a defended 0.927).
- Drag T up to 20. FE climbs toward the truth while the grey OLS bar barely moves: Nickell bias is a short-panel disease, the OLS bias is a fixed-effect disease, and only one of them is cured by time.
- Set ρ = 0.95. The bracket is widest when the series is most persistent — exactly when you most need GMM and when difference GMM's instruments are weakest (Tab 2 shows this on the real data).
- Push N to 500 and reseed. The bars stop jittering but stay on the wrong side of the line. More firms buy precision, never consistency — the bracket is a bias problem, not a noise problem.
The Estimator Ladder
The post's seven real estimates on one axis — OLS, FE, Anderson-Hsiao IV, difference and system GMM — with the bracket band shaded and a story behind every point.
Diagnostics Decoder
AR(1), AR(2), and Hansen read correctly — including why p = 0.000 can be good news and p = 0.99 can be a red flag — plus the instrument-proliferation experiment.
Method Chooser
Answer three questions about your own panel and get the estimator the post's workflow would recommend, with the full practitioner checklist.
Glossary (open a card if a term is unfamiliar)
Lagged dependent variable & ρ
n[i,t] = ρ n[i,t-1] + βx + α_i + ε. ρ is the persistence — the fraction of this year's employment inherited from last year. The post's headline: ρ̂ = 0.927, so ~93% of a shock survives into the next year.Firm fixed effect αi
Nickell bias
The bias bracket (Bond 2002)
Sequential exogeneity
Difference GMM (Arellano-Bond 1991)
System GMM & mean stationarity (Blundell-Bond 1998)
Instrument proliferation & collapsing
Seven estimators, one parameter — the post's real results
Every number below comes from the post's production run on the classic
Arellano-Bond (1991) panel — 140 UK manufacturing firms, 1976–1984
(estimates_summary.csv). All seven estimators target the same
ρ; they disagree because each treats the firm effect αi
differently. The shaded band is the OLS–FE bracket [0.626, 0.962]; the
dashed line is the unit root ρ = 1. Toggle estimators on and off,
hover a point for its numbers, and click it for its story.
Estimators
Click any estimate above for its story
Each point on the ladder has a one-paragraph biography: what the estimator does to the firm effect, why it lands where it lands, and what the post concluded about it. Click a point (or its CI bar) to read it here.
What to look for
- The bracket band does the first screening. OLS (0.962) and FE (0.626) define it; Anderson-Hsiao (1.233) lands entirely outside it, above the unit root — consistent in theory, useless in practice with a CI 1.87 wide.
- Both difference-GMM points hug the floor. 0.708 and 0.679 sit within one SE of the FE bound — Bond's weak-instrument symptom. They pass every printed test (Hansen p = 0.211, AR(2) p = 0.866) and are still not to be trusted.
- System GMM lands in the upper half of the band. 0.902 (one-step) and the headline 0.927 (two-step, 32 collapsed instruments) — the only estimates that pass the bracket check and the printed diagnostics together.
- The headline CI touches the dashed line. [0.773, 1.081] includes ρ = 1: the post can defend "persistence ≈ 0.93", not "employment is stationary". Honest reporting keeps the caveat attached.
Why the half-life is the substantive stake
ρ̂ = 0.626 (FE) implies an employment-shock half-life of ≈ 1.5 years; the defended 0.927 implies ≈ 9 years; OLS's 0.962 implies ≈ 18. Same regression, same data — the estimator choice alone moves the implied labor-market adjustment speed by a factor of twelve. That is what the ladder is for: not picking the prettiest p-value, but picking the rung whose bias story survives scrutiny.
AR(1), AR(2), Hansen — and the proliferation experiment
Every dynamic-panel GMM table ends with the same three tests, and each has a non-obvious reading: AR(1) must reject (it is mechanical), AR(2) must not reject (it protects the t−2 instruments), and Hansen is two-tailed in spirit — p < 0.05 flags invalid instruments, but p drifting toward 1 flags an overwhelmed test. Below are the post's real values, the six-specification proliferation experiment that proves the Hansen drift on this very data, and a short quiz. Read the bars, hover the scatter, then test yourself.
The three tests, real values
Diff GMM (two-step, 91 instruments) vs the headline Sys GMM (two-step, 32 collapsed). Green zone = where each p-value should be.
Instrument proliferation (six system-GMM specs)
Same model six times; only the lag window and collapsing change. Hover a point for ρ̂, SE, and AR(2).
What to look for
- AR(1) p = 0.000 in both models — and that is good news. Differencing makes adjacent errors share εi,t−1, so AR(1) must reject when the model is right. The test that must stay clean is AR(2): 0.866 (diff) and 0.994 (sys) both pass.
- Read the uncollapsed circles left to right: 68 → 95 → 113 instruments pushes Hansen p from 0.035 to 0.235 with the model unchanged. That drift is mechanical overfitting, and its endpoint is the notorious p ≈ 1 red flag.
- Proliferation distorts both tails. The uncollapsed 2:3 spec is outright rejected (p = 0.0348) while its collapsed twin passes (p = 0.0957) — same lag window, same data, opposite verdicts, driven purely by instrument count.
- ρ̂ barely moves across all six cells — [0.921, 0.956]. The point estimate is robust; the test you would use to defend it is not. Collapsing (teal diamonds) buys an honest test at the price of a larger SE (0.0785 vs 0.0274 at 2:99).
What would worry you? (click an answer)
AR(1) in differences: z = −4.49, p = 0.000
Your difference-GMM output strongly rejects no first-order serial correlation in the differenced residuals.
AR(2) in differences: z = −0.01, p = 0.994
The second-order test could hardly be further from rejecting. Is a p-value this close to 1 suspicious here?
Hansen J: p = 0.97 with 130 instruments, N = 140
A colleague reports system GMM whose Hansen test "passes comfortably" at p = 0.97, using 130 instruments on 140 groups.
GMM ρ̂ = 0.679 when FE gave 0.626 — all tests pass
Difference GMM lands within one SE of the fixed-effects bound on a persistent series, with Hansen p = 0.211 and AR(2) p = 0.866.
The decoder in one table
Which estimator does your panel need?
The post's workflow compresses into three questions about your data. Answer them below and the recommendation updates live — then read the checklist that applies no matter which cell you land in. (This assumes your model has a lagged dependent variable; without one, ordinary FE with clustered SEs is usually fine.)
Recommendation
The practitioner checklist (from the post's Discussion)
- Run pooled OLS and fixed effects first and record the bracket [ρ̂FE, ρ̂OLS] — they are your measuring stick, not throwaway regressions (here: [0.626, 0.962]).
- Treat a difference-GMM estimate near the FE bound as a weak-instrument symptom (here: 0.679, within one SE of 0.626) — passing Hansen and AR(2) does not clear it.
- Prefer system GMM when persistence is high, say out loud that you are buying identification with mean stationarity, and check the estimate lands inside the bracket (here: 0.927).
- Read AR(1) as "must reject", AR(2) as "must not reject" — AR(2) is the test that protects your instruments (here: p = 0.994).
- Read Hansen two-tailed: below 0.05 is rejection, drifting toward 1 as instruments accumulate is overfitting (here: 0.462 with 32 instruments).
- Collapse instruments and report the count vs the number of groups (here: 32 vs 140 firms; uncollapsed hit 113) — accept the larger SE as the price of honesty.
- Replicate a published benchmark with your exact toolchain before trusting novel numbers (here: digit-for-digit match to the pydynpd README).