gravatar

ofomicheva86

olga fomicheva

Recently Published

Data 608 Project Proposal
Final Project
Data 608 Project Proposal
Module3
Data Visualization. HW1
test
Assignment 15
DATA 621. Final Project
Discussion 15
Assignment 14
Discussion 14
HW5
Assignment 13
Discussion 13
Assignment 12
Discussion 12
Assignment 11
HW4
Document
HW3
Discussion 10
Homework 2
Assignment 9
Assignment 8
Discussion 8
Assignment 7
Discussion 7
Assignment 6
Assignment 5
Discussion 5
HW1
Assignment 4
Discussion 4
Assignment 3
Discussion 3
Assignment 2
Discussion 2
Discussion 1
Assignment 1
Homework 8
Project 606
Project 606
Project 606
DATA 606. Final Exam
The best classifier
Lab 8
DocumentHomework 11
HW7
Lab 7
Homework 6
HW 5
Lab 6
Lab5
Project 4
It can be useful to be able to classify new "test" documents using already classified "training" documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam. For this project, you can start with a spam/ham dataset, then predict the class of new documents (either withheld from the training dataset or from another source such as your own spam folder). One example corpus: https://spamassassin.apache.org/publiccorpus/
DATA 606. Project Proposal
Assignment 9
The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.
Presentation. Problem 5.13
HW 4
Lab 4b
Lab 4a
Assignment 7
Assignment 7
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
Project 2
Document
HW3
Homework 3
Assignment 5Document
(1) Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below. (2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. (3) Perform analysis to compare the arrival delays for the two airlines
Lab3
Homework 2
Lab2
Homework 2
Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.
Introduction to data
Assignment – Loading Data into a Data Frame
The task is to study the dataset and the associated description of the data (i.e. “data dictionary”). You may need to look around a bit, but it’s there! You should take the data, and create a data frame with a subset of the columns in the dataset. You should include the column that indicates edible or poisonous and three or four other columns. You should also add meaningful column names and replace the abbreviations used in the data—for example, in the appropriate column, “e” might become “edible.” Your deliverable is the R code to perform these transformation tasks.