Exploring the CLT via simulation and using 1- and 2-sample t-tests to make statistical inferences.
An example of creating a generalized additive model using a linear model and a random forest model.
From week 3 of Coursera's "Practical Machine Learning" in the Data Science Specialization.
Notes from week 3 of the Coursera Data Science Specialization series.
Example of creating single- and multi-variate regression.models to predict wage data.
More machine learning, now using principle component analysis to preprocess the data.
Data slicing, K-folds cross validation, k-nearest neighbor imputation, variable transformations (standardization, BoxCox transform).
Using the caret package and the kernlab "spam" data set, I fit a basic GLM to predict whether an email is spam/notspam.
An assignment for a Data Cleaning class. The script downloads a machine-learning data set with a total of 10299 observations, unzips it, reads in the relevant data from 5 files, combines them into a single tidy data set ordered by participant and activity.
Basic analysis on a simple activity tracking data set, available in code.
This analysis is an exercise in literate programming using the knitr package in R. It uses the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, with documentation available at: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf The analysis looks at 902297 observations of severe weather data with the intention of answering questions related to fatalities, injuries, and financial impact.