A tong in cheek regression analysis of an interesting spurious correlation.
This study set out to see if the size of a home and the era in which it was built could be used to get an estimate of the sale price. While these variables seem to be relevant, predictions made only using the variables are largely inaccurate. It appears that the relationship is more complicated
MySQL to NoSQL (MongoDB) Migration Exercise
Is the email message spam or ham? This SVM can help
The assignment is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.
Our motivation for this study is to gain an understanding of which skills are the most useful for a data scientist to have so that we can plan what courses to take in our Master’s program. We scraped 453 job postings from Dice.com. The top skills include: Python, Machine Learning, Data Analysis, SQL and R.
Foundations for statistical inference - Sampling distributions
In this assignment I have selected three books that I enjoyed reading and have entered information these three books in separately created files. One encodes the information in HTML, XML and JSON formats. I am to pull in the data into R and answer if the three data frames are identical.
Creating 3 tidy data sets
Graded problems for the OpenIntro Statistics chapter on Distributions of Random Variables. My answers to the questions are in bold.
Analysis of two airlines arrival ratings for five airports
Graded problems for the Intro to Data chapter of Probability. My answers to the questions are in bold.
A little fun with regular expressions.
This project extracts player information from a semi-structured text file.
Your task is to create an R dataframe that shows rates of tuberculosis infection by country. You should use the information from the tb database and the attached population.csv file. Your R dataframe should have the following columns: Country, Year, Rate (Where Rate is defined as Cases/Population).
Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe. The films used in this study were selected by Googling “2018 recent films” and getting the first 6 films listed in the search results.
Introduction to Data
Graded problems for the Intro to Data chapter of Open Intro to Statistics. My answers to the questions are bolded.
Introduction to R and RStudio
The assignment is to load the Mushroom Dataset from the UCI repository into a data frame, rename the columns into something meaningful, replace the abbreviations used in the data, and subset the columns of the data frame. It must include the column that indicated if the mushroom is edible or poisonous and three or four other columns.
This document explains how the metro employment index is derived. This index is base on the Local Area Unemployment Statistics (LAUS) published by the U.S. Bureau of Labor Statistics (BLS). It is seasonally adjusted using the Loess seasonal decomposition method.
This analysis was preformed as part of Coursera's Reproducible Research course.