Recently Published
Covid19 Article Search
The purpose of this project is to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions regarding the COVID-19 pandemic.
Signal Pattern Detector
SIGNAL is an application capable of accepting signal data and visually presenting it to end users, so signal portions that represent patterns can be easily identified and labeled. The labeled portions are stored in a persistent medium for later retrieval and use.
Next Word Predictor
Computational mechanism, based on Natural Language Processing techniques, that predicts the "next word" for a given input string, which may consist of a single word or a phrase.
Data Science Capstone Project: Next Word Prediction Using NLP Techniques - Milestone Report Nr. 1
Natural Language Processing (NLP) is an important aspect of Artificial Intelligence, which includes machine learning, that contributes in finding efficient ways to communicate with humans and learn from the interactions with them. One such contribution is to present mobile users with predicted "next words," as they type along in apps like WhatsApp, in an effort to expedite message delivery by having the user select a proposed word instead of having to type it.
The purpose of this project is to expose students to the basics of NLP. The student will analyze and clean unstructured textual data from various sources (i.e., blogs, news, and tweets), prepare an n-gram model with the resulting tidy data, and develop a "next word" prediction model that can be used and tested as a Shiny app.
BMI Calculator
Presentation produced for the "Developing Data Products" course of the Data Science certification by Johns Hopkins University & Coursera.
Predicting Correct Workout Exercise Performance Based on Accelerometer Data
This report provides a prediction exercise for determining how correctly workout exercises are performed. The data were obtained from accelerometers used by six individuals who wore them on the belt, arm, forearm, and dumbbells that they used to do a set of exercises.
Using three machine learning techniques, namely gradient boosting method, decision trees, and random forests, prediction models were created using the training data set provided for this project. This training data was divided into two partitions: 75% of the data was used to train the models and 25% of that same data was used to test them. The results were cross validated, and the model with the highest prediction accuracy was selected to process the testing data set that was also provided for this project.
Comparing Fuel Efficiency in Miles per Gallon Between Automatic and Manual Transmission Automobiles
This report provides an analysis and evaluation of fuel efficiency in terms of miles per gallon (i.e., MPG) between automatic and manual transmission automobiles. The data was obtained from the 1974 Motor Trend US magazine, which comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Testing the Overall Effectiveness of Orange Juice vs. Ascorbic Acid in Guinea Pig Tooth Growth
The ToothGrowth dataset refers to the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Question: Is orange juice more effective than ascorbic acid in producing odontoblasts of a minimum certain length?
Exponential Distribution Analysis to Show Normality with Large Simulation Samples
To investigate the exponential distribution in R and compare it with the Central Limit Theorem.