On April 4, 2017, at least 80 civilians died, and hundreds were injured in the town of Khan Sheikhoun in north-western Syria after exposure to the nerve agent & chemical weapon, Sarin. The reason and means by which the chemical was released among the Syrian people remain a matter of controversy, but no matter what the events were that lead to this tragedy, the loss of life is an attrocity of war about which the world did not remain silent.-------- This report will utilize basic text analysis procedures to see what twitter had to say on the day of the event.
Investigating the incidence of missing persons in the United States. ----- Preliminary analysis will look at missing person rates by state, dominant missing sex by state, and odds of being missing by sex & state.----- We will conduct an ecological analysis to determine whether or not females & males are going missing at different rates. We will do the same to compare whites & minorities? ------- To conduct non-ecological analysis, we will evaluate the number of days that a person has been missing for, by sex, using Complete Pooling, No Pooling, and Partial Pooling methods.
Using Logistic & Multinomial regression, I will evaluate the impact of a films production budget on IMDB movie ratings. Drama & Comedy films only.
Using GLM Analysis, we will determine whether boys or girls are more likely to perform above the average in Math and ELA NY State Tests
Using the sample "turnout" dataset from R, we will identify whether or not income dispersion increases or decreases as a function of age and/or education level. This assessment will be done using a Maximum Likelihood Estimation function.
What % of each counties total arrests are drug related? A simple visualization using 2012 US Crime Data from the Uniform Crime Reporting Program, obtained via the Social Explorer.
Using Zelig & Logit, we will evaluate whether an increase in price also increased the likelihood for daily revenue to perform above the 3-year average.
Using Zelig & Logistic Regression Model (Logit) we estimate the probability of surviving the titanic based on sex, age, fare price, and cabin class.
This is a draft of potential research questions, to be refined as the questions are further considered and developed.
Notes as I follow along "R for Data Science" from O'reilly, chapter 3 on basic visualizations using ggplot2.