Building a composite Health Index from Life Expectancy and Infant Mortality using manual PCA with simulated data for 50 countries, then verifying against scikit-learn
Building a comparable Human Development Index across two time periods using pooled PCA with real sub-national data for 153 South American regions, and contrasting with per-period PCA to show why pooled standardization is essential for temporal comparisons
Estimating regression models with high-dimensional fixed effects using PyFixest, from simple OLS through two-way FE, instrumental variables, panel data, and event studies
Estimating causal treatment effects using Difference-in-Differences with the diff-diff package, from the classic 2x2 design through staggered adoption with Callaway-Sant'Anna and HonestDiD sensitivity analysis
A beginner-friendly, comprehensive introduction to Random Forest regression for continuous data, evaluated end-to-end with 5-fold cross-validation and out-of-fold predictions on Bolivian satellite imagery