Recently Published
The Purchase Sales Brain Shiny App: To predict the Airline tickets' sales from 1961 to 1970. Case: The classic Box & Jenkins Airline
The Purchase & Sales Brain is a Shiny App designed to predict the Airline tickets' sales from 1961 to 1970 using an ARIMA model with the time series available in R datasets called AirPassengers. The classic Box & Jenkins airline data. Its format is a monthly time series, in thousands. It records Monthly Airline Passenger Numbers from 1949 to 1960. The app is made up of three sections. the first section carries out a forecast for the range of years that the user provides through a slider, the second one presents the ARIMA model validation and the respective pvalues. Finally, the third presents an exploratory analysis of the Airpassangers time series. Graphics are presented in each section, the user can choose specific customizations.
Darwin Reynell Nava's Job Experience
Background: Interactive maps are used worldwide.
Objectives: Create a web page using R markdown that features a map created with the leaflet package. It should be hosted on Rpubs or Github pages.
Methods: Creation of markers into a Leaflet map.
Results: Seven markers were created indicating building locations and Darwin Nava's job experience.
Conclusions: The web pages were created. Mapping with Leaflet was done. They were hosted in Rpubs and Github pages. Links are available in this document.
The Weight Lifting: Are you doing your unilateral dumbbell biceps curl wrong?
Background: Data from belt, forearm, arm, and dumbbell accelerometers of 6 participants who performed dumbbell unilateral biceps curls.
Objectives: Design and analysis of a machine learning model to predict unilateral dumbbell biceps.
Methods: An inference and prediction analysis in R.
Results: 1. The random forest model accuracy: 0.9584. 2. Predictions on pml_testing data (out-of-sample error in a new dataset): (B A A A A E D B A A A C B A E E A B B B), Levels: A B C D E). 19 0f 20 predictions were correct.
Conclusions: 95% of the predictions were correct on the pml_testing dataset with the designed random forest model. The accuracy of the random forest is good. It showed high performance in predicting execution quality.
Manual vs automatic car transmission. Which one has better fuel economy?
Background: Motor Trend is a magazine about the automobile industry. They are interested in exploring the relationship between car's transmission type and miles per gallon (MPG).
Objectives: Determine association between car's transmission type and miles per gallon (mpg) using regression models. Determine which one has better fuel economy. Quantify the MPG difference between automatic and manual transmissions.
Methods: An statistical inference analysis in R. Regressión models shoulb be used. The mtcars data available at The R Datasets Package are used.
Results: 1. It was found a significant association between car's transmission type and miles per gallon (mpg) in our Fit1 model (unadjusted) where manual transmission has better fuel economy than automatic one. The fit4 model (adjusted and with best performance) show than holding wt and cyl constant, the transmission types appears to have almost the same impact on mpg. 2. Related to the MPG difference between automatic and manual transmission, we see at the Fit1 model that the mean (y-value) of the level "Manual" is 7.245 units higher (24.392) than the mean (y-value) of "Automatic" (which is listed as the intercept, 17.147), pvalue:0.000285.
Conclusions: From a mechanical design point of view, manual transmission engines are less complex, weigh less, and have more gears than automatics,thus favoring higher mpg for manual transmissions.The hypothesis tests showed that the mean wt (weight) for automatic transmission is higher than for manual transmission. analogously, the average mpg is lower for automatic transmission.
Is there an association between college major category and income?
Background: A research experiment with the purpose of better understanding how people analyze data is carried out by professors of "The Data Science Specialization - Johns Hopkins University" taught through coursera. An inferential analysis on the data provided must be carried out by the student.
Objectives: Determine association between the college major category and income using linear regression. Provide a record of the analysis (the R command history).
Methods: An statistical inference analysis in R. Linear regressión shoulb be used. The college data at the gibhub repository jhudsl/collegeIncome are used.
Results: Most majors have similar income, except Business. Business shows the highest income. Its result is significant different from Computers & Mathematics, Education, Engineering, and Humanities & Liberal Arts. However, reviewing the unadjusted model, Residuals vs Fitted plot shows the points are not randomly dispersed around the horizontal axis, QQ plot shows the residuals aren’t Gaussian and thus the errors aren’t either, Scale-Location plot shows the residuals are not randomly spread and the red smooth line is not horizontal and shows a steep angle and Residuals vs Leverage plot shows there is no influential case, or cases to the regression results.
Conclusions: There is not a significant association between college major category and income. The major category "Business" shows the highest income in our model. However, the model is not good, a nonlinear model should be more appropriate for modeling these data.
Does vitamin C dose size affect tooth growth in guinea pigs?
Background: Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC). The response is the length of odontoblasts in 60 guinea pigs.
Objectives: Determine the vitamin C effect on tooth growth in guinea pigs.
Methods: An statistical inference analysis. The ToothGrowth data in the R datasets package are used.
Results: 1. Effectiveness of the delivery method disregarding dosage: p-value = 0.02635, Ho: rejected. 2. Effectiveness of administered dose size regardless of delivery method:a) len|0.5 < len|1, p-value = 1.494e-07, Ho: rejected. b) len|1 < len|2, p-value = 1.097e-05, Ho: rejected. 3. Effectiveness of 2 mg administered dose size by delivery method: len|2|VC = len|2|OJ, p-value = 0.9639, Ho: failed to reject.
Conclusions: Greater growth of odontoblasts is observed for the 0.5 and 1 mg doses delivered by the coded OJ (orange juice) delivery method. For the 2 mg dose, the delivery method does not influence such growth. OJ yields higher tooth growth in guinea pigs than VC. In reference to dose size, when this increases, the odontoblasts growth also does.
Does the Central Limit Theorem (CLT) apply to Exponential Probability Distributions (EPD)?
Background: The CLT states that the distribution of sample means approximates a normal distribution as the sample size gets larger regardless of its distribution in the population.
Objectives: Determine the properties of the distribution of the mean of 40 exponentials witn lambda=0.2. Determine that the EPD obtained follows the CTL.
Methods: An statistical inference analysis via simulations in R.
Results: The sample mean is 4.99 while the theoretical mean of the distribution is 5. The sample variance is 25,064 while to the theoretical variance of the distribution is 25. The distribution for a large collection of averages of 40 exponentials is approximately normal. However, the distribution of a large collection of random exponentials is exponential.
Conclusions: the Central Limit Theorem (CLT) applies to Exponential Probability Distributions (EPD) too.
What are the most dangerous/damaging weather event types in the United States?
Background: USA experiences meteorological disasters each year causing deaths, injuries, property damages, and disruptions to commerce.
Objectives: Determine, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health and which ones have the greatest economic consequences, across the United States.
Methods: An exploratory statistical analysis. The data comes from the US NOAA Storm Database, 1950-2011.
Results: Tornadoes, Heat and Floods are the weather event types that cause the highest number of injuries and fatalities in the United States. Floods, Hurricanes or Typhoons, and Storm Surges/Tides cause the greatest economic losses in the United States.
Conclusions: The graphs show a trend of annual increase in the amount of material losses. The same pattern is observed in the fatalities and injuries variables.