Dynamic Panel Explorer — Interactive Lab

How persistent is firm employment? Two wrong answers that bracket the truth

If this year's employment depends on last year's, the model needs yesterday's outcome on the right-hand side — and that one change breaks both workhorse panel estimators in opposite directions. Pooled OLS loads the omitted firm effect onto the lag and lands above the true persistence ρ; the within (fixed-effects) estimator picks up Nickell bias and lands below it. Two wrong answers with known signs are a measuring stick: the truth must sit inside the bracket. In the post's real data the bracket is [0.626, 0.962]. Set the true ρ yourself, then drag T and N and watch the bracket form around it.

Why the bracket tightens as T grows (animation)

Average ρ̂ across simulated panels as the panel length T grows, holding true ρ = 0.80. The grey curve is pooled OLS: biased up by the firm effect α_i, and longer panels do not save it. The orange curve is fixed effects: biased down by order −1/T (Nickell), so it climbs toward the truth as T grows. The bracket between them is widest exactly where the post's data lives — T ≈ 7–9 years per firm.

Simulate your own panel

True ρ 0.80

Persistence of the simulated employment process. The post's defended estimate is ρ ≈ 0.93 — try the high end and watch both biases grow.

Years per firm T 8

Nickell bias is of order −1/T: severe at T = 3, modest at T = 20. The Arellano-Bond panel has T = 7–9.

Number of firms N 140

More firms shrink the noise in both estimates but do not remove either bias — Nickell bias survives N → ∞.

true ρ

0.80

the target both estimators miss

OLS ρ̂ (pooled)

—

biased up: lag absorbs α_i (post: 0.962)

FE ρ̂ (within)

—

Nickell bias ≈ −(1+ρ)/T (post: 0.626)

bracket width

—

OLS ρ̂ − FE ρ̂ (post: 0.336)

What to look for

Drag T down to 3 or 4. The orange FE bar sinks far below the steel true-ρ line — at the post's T ≈ 7–9 the simulated FE bias is the same order as the real one (0.626 against a defended 0.927).
Drag T up to 20. FE climbs toward the truth while the grey OLS bar barely moves: Nickell bias is a short-panel disease, the OLS bias is a fixed-effect disease, and only one of them is cured by time.
Set ρ = 0.95. The bracket is widest when the series is most persistent — exactly when you most need GMM and when difference GMM's instruments are weakest (Tab 2 shows this on the real data).
Push N to 500 and reseed. The bars stop jittering but stay on the wrong side of the line. More firms buy precision, never consistency — the bracket is a bias problem, not a noise problem.

Tab 2

The Estimator Ladder

The post's seven real estimates on one axis — OLS, FE, Anderson-Hsiao IV, difference and system GMM — with the bracket band shaded and a story behind every point.

Tab 3

Diagnostics Decoder

AR(1), AR(2), and Hansen read correctly — including why p = 0.000 can be good news and p = 0.99 can be a red flag — plus the instrument-proliferation experiment.

Tab 4

Method Chooser

Answer three questions about your own panel and get the estimator the post's workflow would recommend, with the full practitioner checklist.

Glossary (open a card if a term is unfamiliar)

Lagged dependent variable & ρ

The model puts yesterday's outcome on the right-hand side: n[i,t] = ρ n[i,t-1] + βx + α_i + ε. ρ is the persistence — the fraction of this year's employment inherited from last year. The post's headline: ρ̂ = 0.927, so ~93% of a shock survives into the next year.

Firm fixed effect α_i

A permanent firm-specific level (management quality, industry niche, plant size) sitting in the error. Last year's employment depends on it by construction, so the lag is correlated with the error no matter how many controls you add.

Nickell bias

The downward bias of ρ̂ when fixed effects are applied to a dynamic model on a short panel. Order −(1+ρ)/T. With T ≈ 7–9 here, FE returns 0.626 against a defended 0.927.

The bias bracket (Bond 2002)

OLS is biased up, FE is biased down — so any consistent estimator must land between them. Here: [0.626, 0.962]. A GMM estimate hugging the FE floor is a weak-instrument symptom even when every printed test passes.

Sequential exogeneity

The identification assumption behind the lag instruments: past values of the variables must be uncorrelated with future shocks. Partially testable — the AR(2) test guards exactly this for the t−2 instruments.

Difference GMM (Arellano-Bond 1991)

First-difference away α_i, then instrument Δn[i,t−1] with all available lagged levels (t−2 and deeper), weighted optimally. Here: 91 instruments, ρ̂ = 0.679 — hugging the FE bound because lagged levels barely predict differences when ρ is near 1.

System GMM & mean stationarity (Blundell-Bond 1998)

Stacks the differenced equation with the levels equation (instrumented by lagged differences), buying identification strength from one extra untestable assumption: firms' initial deviations from steady state are uncorrelated with α_i. Here: ρ̂ = 0.927 with 32 collapsed instruments — the headline.

Instrument proliferation & collapsing

The instrument count grows quadratically with T and can approach the number of groups (Roodman's ceiling: here 140 firms), overfitting the endogenous variables and disarming the Hansen test. Collapsing keeps one instrument per lag depth instead of one per lag-and-period. Here: 113 → 32 instruments; Hansen p 0.235 → 0.462; SE honestly larger.

Seven estimators, one parameter — the post's real results

Every number below comes from the post's production run on the classic Arellano-Bond (1991) panel — 140 UK manufacturing firms, 1976–1984 (estimates_summary.csv). All seven estimators target the same ρ; they disagree because each treats the firm effect α_i differently. The shaded band is the OLS–FE bracket [0.626, 0.962]; the dashed line is the unit root ρ = 1. Toggle estimators on and off, hover a point for its numbers, and click it for its story.

Click any estimate above for its story

Each point on the ladder has a one-paragraph biography: what the estimator does to the firm effect, why it lands where it lands, and what the post concluded about it. Click a point (or its CI bar) to read it here.

What to look for

The bracket band does the first screening. OLS (0.962) and FE (0.626) define it; Anderson-Hsiao (1.233) lands entirely outside it, above the unit root — consistent in theory, useless in practice with a CI 1.87 wide.
Both difference-GMM points hug the floor. 0.708 and 0.679 sit within one SE of the FE bound — Bond's weak-instrument symptom. They pass every printed test (Hansen p = 0.211, AR(2) p = 0.866) and are still not to be trusted.
System GMM lands in the upper half of the band. 0.902 (one-step) and the headline 0.927 (two-step, 32 collapsed instruments) — the only estimates that pass the bracket check and the printed diagnostics together.
The headline CI touches the dashed line. [0.773, 1.081] includes ρ = 1: the post can defend "persistence ≈ 0.93", not "employment is stationary". Honest reporting keeps the caveat attached.

Why the half-life is the substantive stake

ρ̂ = 0.626 (FE) implies an employment-shock half-life of ≈ 1.5 years; the defended 0.927 implies ≈ 9 years; OLS's 0.962 implies ≈ 18. Same regression, same data — the estimator choice alone moves the implied labor-market adjustment speed by a factor of twelve. That is what the ladder is for: not picking the prettiest p-value, but picking the rung whose bias story survives scrutiny.

AR(1), AR(2), Hansen — and the proliferation experiment

Every dynamic-panel GMM table ends with the same three tests, and each has a non-obvious reading: AR(1) must reject (it is mechanical), AR(2) must not reject (it protects the t−2 instruments), and Hansen is two-tailed in spirit — p < 0.05 flags invalid instruments, but p drifting toward 1 flags an overwhelmed test. Below are the post's real values, the six-specification proliferation experiment that proves the Hansen drift on this very data, and a short quiz. Read the bars, hover the scatter, then test yourself.

The three tests, real values

Diff GMM (two-step, 91 instruments) vs the headline Sys GMM (two-step, 32 collapsed). Green zone = where each p-value should be.

Instrument proliferation (six system-GMM specs)

Same model six times; only the lag window and collapsing change. Hover a point for ρ̂, SE, and AR(2).

What to look for

AR(1) p = 0.000 in both models — and that is good news. Differencing makes adjacent errors share ε_i,t−1, so AR(1) must reject when the model is right. The test that must stay clean is AR(2): 0.866 (diff) and 0.994 (sys) both pass.
Read the uncollapsed circles left to right: 68 → 95 → 113 instruments pushes Hansen p from 0.035 to 0.235 with the model unchanged. That drift is mechanical overfitting, and its endpoint is the notorious p ≈ 1 red flag.
Proliferation distorts both tails. The uncollapsed 2:3 spec is outright rejected (p = 0.0348) while its collapsed twin passes (p = 0.0957) — same lag window, same data, opposite verdicts, driven purely by instrument count.
ρ̂ barely moves across all six cells — [0.921, 0.956]. The point estimate is robust; the test you would use to defend it is not. Collapsing (teal diamonds) buys an honest test at the price of a larger SE (0.0785 vs 0.0274 at 2:99).

What would worry you? (click an answer)

AR(1) in differences: z = −4.49, p = 0.000

Your difference-GMM output strongly rejects no first-order serial correlation in the differenced residuals.

AR(2) in differences: z = −0.01, p = 0.994

The second-order test could hardly be further from rejecting. Is a p-value this close to 1 suspicious here?

Hansen J: p = 0.97 with 130 instruments, N = 140

A colleague reports system GMM whose Hansen test "passes comfortably" at p = 0.97, using 130 instruments on 140 groups.

GMM ρ̂ = 0.679 when FE gave 0.626 — all tests pass

Difference GMM lands within one SE of the fixed-effects bound on a persistent series, with Hansen p = 0.211 and AR(2) p = 0.866.

The decoder in one table

AR(1) in differencesmust REJECT — rejection is mechanical good news (ours: p = 0.000)

AR(2) in differencesmust NOT reject — validates the t−2 instruments (ours: p = 0.994)

Hansen Jtwo-tailed: p < 0.05 invalid; p → 1 overwhelmed (ours: 0.462 with 32 instruments)

Instrument countreport it vs the number of groups; collapse when it grows (ours: 32 vs 140 firms)

Which estimator does your panel need?

The post's workflow compresses into three questions about your data. Answer them below and the recommendation updates live — then read the checklist that applies no matter which cell you land in. (This assumes your model has a lagged dependent variable; without one, ordinary FE with clustered SEs is usually fine.)

Is T small? (< ~20 periods)

Yes — short panel No — long panel

Nickell bias is order −1/T. The post's panel: T = 7–9 → firmly "small".

Is the series persistent? (ρ near 1)

Yes — ρ ≳ 0.8 No — ρ moderate

Check your own OLS-FE bracket: if both endpoints are high, the series is persistent (post: [0.626, 0.962]).

Are other regressors endogenous or predetermined?

Yes — e.g. wages respond to shocks No — strictly exogenous

The post instruments w and k with their own lags via gmm(w, ...) gmm(k, ...); strictly exogenous regressors can enter as iv(...) instead.

Recommendation

The practitioner checklist (from the post's Discussion)

Run pooled OLS and fixed effects first and record the bracket [ρ̂_FE, ρ̂_OLS] — they are your measuring stick, not throwaway regressions (here: [0.626, 0.962]).
Treat a difference-GMM estimate near the FE bound as a weak-instrument symptom (here: 0.679, within one SE of 0.626) — passing Hansen and AR(2) does not clear it.
Prefer system GMM when persistence is high, say out loud that you are buying identification with mean stationarity, and check the estimate lands inside the bracket (here: 0.927).
Read AR(1) as "must reject", AR(2) as "must not reject" — AR(2) is the test that protects your instruments (here: p = 0.994).
Read Hansen two-tailed: below 0.05 is rejection, drifting toward 1 as instruments accumulate is overfitting (here: 0.462 with 32 instruments).
Collapse instruments and report the count vs the number of groups (here: 32 vs 140 firms; uncollapsed hit 113) — accept the larger SE as the price of honesty.
Replicate a published benchmark with your exact toolchain before trusting novel numbers (here: digit-for-digit match to the pydynpd README).

How persistent is firm employment? Two wrong answers that bracket the truth

Why the bracket tightens as T grows (animation)

Simulate your own panel

What to look for

The Estimator Ladder

Diagnostics Decoder

Method Chooser

Glossary (open a card if a term is unfamiliar)

Seven estimators, one parameter — the post's real results

Estimators

Click any estimate above for its story

What to look for

Why the half-life is the substantive stake

AR(1), AR(2), Hansen — and the proliferation experiment

The three tests, real values

Instrument proliferation (six system-GMM specs)

What to look for

What would worry you? (click an answer)

AR(1) in differences: z = −4.49, p = 0.000

AR(2) in differences: z = −0.01, p = 0.994

Hansen J: p = 0.97 with 130 instruments, N = 140

GMM ρ̂ = 0.679 when FE gave 0.626 — all tests pass

The decoder in one table

Which estimator does your panel need?

Recommendation

The practitioner checklist (from the post's Discussion)