RPubs

by RStudio

erobinson95

Emily Robinson

Recently Published

Estimation for Spatially Dependent Data

over 5 years ago

Nonlinear Model for Soybean Growth

over 5 years ago

Improved Classification with SuperLearner

Imagine there is a jar of marbles and each person is asked to guess how many marbles are in the jar. Some individuals will guess over and some will guess under. What happens if you average the guesses? As technology has advanced, so have statistical models. There are now multiple choices of models for classification including: logistic regression, support vector machines, discriminant analysis, classification trees, etc. The issue arises when deciding which one to use and which tuning parameters to select for each given model. A solution that may improve the classification rates is to combine all predictions into one, more improved prediction (Steinki and Mohammad 2015). This is the basis idea behind ensemble modeling. In class, we discussed a variety of ensemble model methods including: model averaging, bagging, random forests, boosting, bumping, and stacking. The focus of this report is on stacking. I will introduce an R package, SuperLearner, and fit a few base models using classification methods we learned in class and evaluate the area under the curve (AUC) of the receiver operating characteristic (ROC). A ROC curve is a plot of the true positive rate (sensitivity) against the false positive rate (1-specificity) for the different possible cutpoints of a diagnostic test. This shows the tradeoff between sensitivity and specificity. To compare models, we calculate the AUC, a higher AUC indicates a better fit. After fitting the base models, I will combine the base learners into a super learner and compare the final model to the base learners.

over 5 years ago

Sign In

RPubs

erobinson95

Emily Robinson

Recently Published

Estimation for Spatially Dependent Data

Nonlinear Model for Soybean Growth

Improved Classification with SuperLearner