Dan Watkins

Recently Published

Statistical Inference, Project 2
Exploring the CLT via simulation and using 1- and 2-sample t-tests to make statistical inferences.
Machine Learning: Combining Predictors
An example of creating a generalized additive model using a linear model and a random forest model.
Machine Learning: Random Forests and Boosting
From week 3 of Coursera's "Practical Machine Learning" in the Data Science Specialization.
Machine Learning: Trees and Bagging
Notes from week 3 of the Coursera Data Science Specialization series.
Machine Learning: Regression Modeling
Example of creating single- and multi-variate regression.models to predict wage data.
Machine Learning: Preprocessing and PCA/SVD
More machine learning, now using principle component analysis to preprocess the data.
Practical Machine Learning, Week 2
Data slicing, K-folds cross validation, k-nearest neighbor imputation, variable transformations (standardization, BoxCox transform).
Basic Machine Learning Example
Using the caret package and the kernlab "spam" data set, I fit a basic GLM to predict whether an email is spam/notspam.
Data Cleaning Example
An assignment for a Data Cleaning class. The script downloads a machine-learning data set with a total of 10299 observations, unzips it, reads in the relevant data from 5 files, combines them into a single tidy data set ordered by participant and activity.
Reproducible Research, Project 1
Basic analysis on a simple activity tracking data set, available in code.
Reproducible Research, Project 2
This analysis is an exercise in literate programming using the knitr package in R. It uses the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, with documentation available at: The analysis looks at 902297 observations of severe weather data with the intention of answering questions related to fatalities, injuries, and financial impact.