Ben Horvath

Recently Published

Deriving the Least Squares Solution
Full derivation of the least squares solution for single-variable regression
Modeling Housing Violations in New York City
The purpose of this document is to explore the relationship between 311 calls and housing violations in New York City. After investigating their statistical properties, and incorporating demographic variables, I develop a number of successful models for predicting housing violations in NYC zip codes. After testing each model on a hold-out set, the best model was a special Poisson regression method that accounted for 72 percent of variation in housing violations.
DATA 607—Discussion 11
Data 607 – Project 4
Our purpose is to take two directories of e-mails, one containing spam, the other containing ham, and develop a model to predict whether e-mails are spam or ham. After attempting to parse the e-mails to get rid of the header data, I will use TF-IDF scores to create a feature set, split the data into train and test sets (75/25), train a Naive Bayes model, and then use accuracy, precision, recall, and F1 score to evaluate the model.
DATA 607—Homework No. 7
An R implementation of the NYT Books API
DATA 607 -- Homework No. 3
DATA 607—Homework No. 2
DATA 607 -- Homework No. 1
Various simple transformations on the UCI repository's mushroom dataset, available at
Homework Template
Testing to ensure these template settings will carry over to Rpubs correctly
Homework No. 2