gravatar

wubr2000

Bruno Wu

Recently Published

STATS290 - Assignment 2
dplyr, package creation, data wrangling, flight map using Nathan Yau's code
Homework #3, Question 5 (Balog SUID# 05849374, Jaiswal SUID# 05961816, Wu SUID# 05173124)
Neural Nets using R. Answer to STATS 315B homework question.
Ensemble Learning for Kaggle Titanic Competition
Ensemble learning using logistic, boosting, and SVM.
Principal Components and Partial Least Square for Kaggle Titanic Competition
Principal Components Regression and Partial Least Square Regression
Regularization Methods for Kaggle Titanic Competition
Ridge and Lasso Regressions.
Exploring Non-linearity and Interaction Terms for Kaggle Titanic Competition
In which I found out that non-linearity in Sib/Spouse variable is HUGE! It's not overfitting either because I found that adding this factor to the training set helps and then it significantly improved on predictive power on the test set. But WHY?
Logistic Regressions and Subset Selection for the Titanic Kaggle Competition
Include initial data cleaning, adding some new variables. Used Best Subset Selection method to pick Variables but didn't do full CV.
Support Vector Machine for the Titanic Kaggle Competition
SVM using linear and radial kernels.
Decision Trees for the Titanic Kaggle Competition
Using Random Forest and Boosting. Boosting performed substantially better!
First Attempt at Kaggle March Madness Competition
This is first attempt at creating models, making prediction and submitting results to Kaggle's March Madness competition. Following a tutorial from a blog.
Random Walks and One-sidedness…
Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin (John von Newman)
Homework 4 for STATS216 by Bruno Pen Wu
My answers for HW4.
Runkeeper Map
From FlowingData tutorial. Will try this for Hong Kong.
Spam Filter using Naive Bayes Classifier
From "ML for Hackers" book. Chapter 3. Most critical step is learning how to build a "Term Document Matrix": Columns contain all the terms found in all of the documents. Rows are the emails (each document). tm package is very handy for text distilling.
NCI60 Data Example
From Lab 10 in ISLR. Using Unsupervised Learning techniques - PCA and Hierarchical Clustering.
PCA from ISLR Book (Lab 10)
Testing out R Presentation with this analysis
Support Vector Machine lecture
R-lab from SVM lecture in Stats 216
Linear Regression Analysis of NCAA Basketball Data
In-class example of Linear Regression Analysis for NCAA basketball data
Homework 1 for STATS216 by Bruno Pen Wu
Homework 1 for Stats 216 Stanford
Homework 2 for STATS216 by Bruno Pen Wu
Homework 2 for Stats 216 Stanford
Homework 3 for STATS216 by Bruno Pen Wu
My homework 3 answers for Stats 216 Stanford