Full derivation of the least squares solution for single-variable regression
The purpose of this document is to explore the relationship between 311 calls and housing violations in New York City. After investigating their statistical properties, and incorporating demographic variables, I develop a number of successful models for predicting housing violations in NYC zip codes. After testing each model on a hold-out set, the best model was a special Poisson regression method that accounted for 72 percent of variation in housing violations.
Our purpose is to take two directories of e-mails, one containing spam, the other containing ham, and develop a model to predict whether e-mails are spam or ham. After attempting to parse the e-mails to get rid of the header data, I will use TF-IDF scores to create a feature set, split the data into train and test sets (75/25), train a Naive Bayes model, and then use accuracy, precision, recall, and F1 score to evaluate the model.
An R implementation of the NYT Books API
Various simple transformations on the UCI repository's mushroom dataset, available at https://archive.ics.uci.edu/ml/datasets/Mushroom/.
Testing to ensure these template settings will carry over to Rpubs correctly