Recently Published
"Actuarial Claim Cost Modeling: Comparing Linear Regression, Tweedie GLM, and XGBoost Ensembles"
A practical comparison of four approaches for modeling insurance claim costs – Linear Regression, Two‑Part (Logistic+Gamma), Tweedie GLM, and XGBoost with Tweedie loss. Using simulated car insurance data, this project demonstrates the actuarial edge of the Tweedie distribution for zero‑inflated, skewed claim data. All models perform similarly, with XGBoost achieving a marginal 2.21% RMSE improvement. The real value lies in feature importance – revealing past claims and mileage as the strongest predictors.
Choosing the Right Forecasting Tool: Linear Regression vs Ensemble Boosting
A hands-on data science project comparing Linear Regression and XGBoost
for retail sales forecasting. This project reveals an unexpected outcome –
the simpler model wins – and explores why model selection matters more than
algorithm complexity in real-world applications.
ZERO INFLATED DATA
A guided walkthrough of modeling count data with excess zeros.
This project compares Poisson, Negative Binomial, Zero‑Inflated (ZIP/ZINB),
Hurdle, and Bayesian (JAGS) models on simulated cargo shipment data and
the real bioChemists dataset. Model selection is driven by AIC and DIC,
demonstrating when structural zero processes are justified – and when
simpler models win. Includes practical notes on NaN AIC/DIC issues and
numerical stability.