## Recently Published

##### Deriving the Least Squares Solution

Full derivation of the least squares solution for single-variable regression

##### Modeling Housing Violations in New York City

The purpose of this document is to explore the relationship between 311 calls and housing violations in New York City. After investigating their statistical properties, and incorporating demographic variables, I develop a number of successful models for predicting housing violations in NYC zip codes. After testing each model on a hold-out set, the best model was a special Poisson regression method that accounted for 72 percent of variation in housing violations.

##### Data 607 – Project 4

Our purpose is to take two directories of e-mails, one containing spam, the other containing ham, and develop a model to predict whether e-mails are spam or ham.
After attempting to parse the e-mails to get rid of the header data, I will use TF-IDF scores to create a feature set, split the data into train and test sets (75/25), train a Naive Bayes model, and then use accuracy, precision, recall, and F1 score to evaluate the model.

##### DATA 607—Homework No. 7

An R implementation of the NYT Books API

##### DATA 607 -- Homework No. 1

Various simple transformations on the UCI repository's mushroom dataset, available at https://archive.ics.uci.edu/ml/datasets/Mushroom/.

##### Homework Template

Testing to ensure these template settings will carry over to Rpubs correctly