RPubs

by RStudio

MatthewMills

Matthew Mills

Recently Published

Support Vector Machines

Support Vector Machines provided a robust method for classifying purchase behavior. While the linear kernel was simple and interpretable, the radial kernel captured more complex relationships in the data. Cross-validation helped prevent overfitting and select optimal model parameters.

11 months ago

Tree- Based Methods

Tree-Based Methods in R

11 months ago

HW 5

Sure! Here's a concise 4-sentence summary: Linear model selection involves identifying a subset of predictors that best explains the response variable, balancing model complexity and predictive accuracy. Methods like best subset selection, forward selection, and backward elimination help choose the most relevant variables. Regularization techniques such as ridge regression and the lasso improve model performance by introducing a penalty on the size of coefficients to reduce overfitting. While ridge shrinks all coefficients toward zero, the lasso can force some to be exactly zero, thus performing variable selection as well.

12 months ago

Sampling Techniques

Sampling techniques in machine learning help evaluate model performance by partitioning data in different ways. **Leave-One-Out Cross-Validation (LOOCV)** trains on all but one observation, repeating for each, while **k-Fold Cross-Validation** divides data into k subsets, training on k-1 and testing on the remaining fold. The **Validation Approach (Train-Test Split)** randomly splits data into training and testing sets, offering simplicity but higher variance. **Bootstrap Resampling** draws multiple samples with replacement to estimate model uncertainty, useful for small datasets but prone to overfitting.

about 1 year ago

Linear Regression and Logistic Regression Models

We began by cleaning the data, handling missing values, removing outliers, and formatting categorical variables. Through exploratory data analysis, we used visualizations and summary statistics to understand variable distributions and relationships. We fit a linear regression model to predict a continuous outcome and evaluated it using metrics like R-squared . For the logistic regression model, we predicted a binary outcome, interpreting odds ratios

about 1 year ago

Classification

In this analysis, multiple classification models, including K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Generalized Linear Models (GLM), Quadratic Discriminant Analysis (QDA), and Naive Bayes, are applied to various datasets. The goal is to explore and compare the performance of these models in predicting categorical outcomes. The analysis involves data preprocessing, model training, and evaluation of prediction accuracy using metrics like confusion matrices. Each model's strengths and limitations are discussed in the context of the datasets used.

about 1 year ago

Linear Regression

about 1 year ago

Sign In

MatthewMills

Matthew Mills

Recently Published