RPubs

by RStudio

RockBen

Ebenezer U

Recently Published

Document"Data Uniformity: A Statistic Assessment and management of Outlier and Missing Values In R "

The uniformity of a dataset helps the analyst to get an accurate result or an higher accuracy; two major issues to accuracy are from outliers and missing values not handled well. Thus, pre-processing of your data value is the crucial point of any analysis and the focal point of any analyst whose interest is getting accurate insight from the dataset.

over 2 years ago

Document"Dummy Variables

over 2 years ago

Analyzing President Bola Ahmed Tinubu's 29 MAY 2023 Inaugural Address

Based on the sentiment scores provided for the political address, we can observe the following sentiments and their corresponding counts and percentages: Anger: 23 occurrences, accounting for approximately 3.87% of the sentiment score. Anticipation: 58 occurrences, accounting for approximately 9.76% of the sentiment score. Disgust: 14 occurrences, accounting for approximately 2.36% of the sentiment score. Fear: 43 occurrences, accounting for approximately 7.24% of the sentiment score. Joy: 63 occurrences, accounting for approximately 10.61% of the sentiment score.

over 2 years ago

Analyzing President Bola Ahmed Tinubu’s 29 MAY 2023 Inaugural Address

when the list output is true it tells us we have the key word, when it is false, the key word is missing. the second code tells us the number of matches we got in there. the third code tells us where the key words where mention and the last code help us to see the surrounding text around the key words

over 2 years ago

ETL, EDA, and Control Tests Notebook

course

over 2 years ago

DocumentMultiple Regression

Recall that our broad business question is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?” Up to this point we have analyzed the ability of predictor variables (aka explanatory variables) to create forecasts of quarterly revenue independently of each other. In other words, we have used one predictor variable in a regression model. This is known as simple regression. In this lesson we will investigate the benefits of using multiple variables in the same linear model to create those forecasts. When we do that it’s known as multiple regression.

over 2 years ago

Residual Analysis:

: Introduction This dataset is An Adidas sales dataset that have information on the sales of Adidas products, number of units sold, the total sales revenue, the location of the sales, the sales outlets and method of sales.

over 2 years ago

Residuals and Predictions

Residuals and Predictions Recall that our broad business question is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?” Creating a model to help answer this question can certainly be helpful for predicting future performance. Another way in which the model can be used for business purposes is to evaluate past performance.

over 2 years ago

Simple Regression

Linear models can be very effective tools for forecasting a business’s performance. Visually fitting a line to a scatter plot is effective, but it has two main

over 2 years ago

AN R PROJECT ON AFRICA CONTINENT COVID-19 DATA ANALYSIS AS OF JANUARY 2020 TO MAY 2023" Author: ebenezer akpati

the analysis suggests a strong linear relationship between the variable "total_cases" and the outcome variable. The coefficient estimate for "total_cases" is positive (0.01227), indicating that an increase in "total_cases" is associated with an increase in the outcome variable. The model has a good fit, as indicated by the high R-squared value and the overall significance of the model.

over 2 years ago

Simple Regression

Simple Regression Linear models can be very effective tools for forecasting a business’s performance. Visually fitting a line to a scatter plot is effective, but it has two main drawbacks: first, it’s subjective.

over 2 years ago

"Correlation"

In this lesson we will explore the concept of correlation, how to calculate correlations in R, and how correlations can be used to provide insight about the relationship between two columns of data. At this point you should have an understanding of the Teca regression data (tecaRegressionData.rds). We will use this data to provide insight to our business question, which is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?”

over 2 years ago

t"Regression Data"

In this lesson you will be introduced to the data that will be used to answer this question which is, "How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?"

over 2 years ago

DocumentMORE ON FUNCTIONS: ARGUMENTS, CREATING, PRINTING, SAVING RESULTS, RETURNING RESULTS

Don’t repeat yourself. One reason why you should not repeat code is because

over 2 years ago

Joining Data

Some useful insights can be gained when one dataset is analyzed in the context of another dataset. For instance, if weather is expected to have an influence on sales, then it may be worth combining the weather measurements to the point-of-sale data.

over 2 years ago

Stacking and Sorting Data

Often times large datasets are divided into smaller dataframes and stored in separate files. For instance, point of sale data may be stored in such a way that there’s a separate file for each month. Alternatively, subsets of the data are extracted from a large database in smaller sections. In either of these situations, the rows from the different dataframes need to be stacked together to form a single dataframe. You might call this a vertical stack because it makes the dataframe longer.

over 2 years ago

Data Aggregation and Summary

Data Aggregation and Summary This lesson introduces two functions from the dplyr package for aggregating data: the group_by() function and the summarise() function. We will also review how to use the lubridate package for converting strings to datetime types, as well as for rounding datetime values to date values. Finally, we will introduce the n_distinct() function for calculating the distinct number of values for different groups.

over 2 years ago

Pivoting Dataframes Between Wide and Long Shapes

Pivoting Dataframes Between Wide and Long Shapes This lesson introduces two functions from the tidyr package for pivoting dataframes between wide and long formats. The tidyr package is part of the tidyverse, and it has functions for reshaping dataframes. The shape of a dataframe refers to the number of rows and columns. Many plotting functions and dashboard applications work best with long dataframes that have few columns and many rows. In contrast, many algorithms and human readable tables work best with wide dataframes that have few rows and many columns.

over 2 years ago

Handling Missing Values

Handling Missing Values This lesson introduces some ways to deal with missing values. It’s important that missing values are either removed or filled in with imputed values so that algorithms do not throw errors.

over 2 years ago

Using dplyr's Mutate, Rename, Relocate, and Distinct Functions

This lesson focuses on four functions that simplify some common data preprocessing tasks, `mutate()`, `rename()`, `relocate()`, and `distinct()`. There are many other functions in the dplyr package for wrangling data. You should spend some time reviewing them when you want to perform a specific task.

over 2 years ago

Subsetting Data Using Filter and Select Functions

This lesson will illustrate how to use the tidyverse grammar for (1) reducing the length of the dataframe to specific rows using the filter function in the dplyr package, and (2) reducing the width of the dataframe to specific columns using the select function in the dplyr package.

over 2 years ago

Useful Operators: %>% and %in%

In this lesson you'll learn about two useful operators. The pipe operator, %>%, allows you to chain functions together. The %in% operator allows you to evaluate if a value is in a vector of values.

over 2 years ago

Review of Notebooks and Introduction to dplyr

a brief on understanding basic functions of dplyr

over 2 years ago

Quantity Virtual Internship - Retail Strategy and Analytics - Task 1

Mainstream, midage and young singles and couples are also more likely to pay more per packet of chips. This is indicative of impulse buying behaviour. We’ve also found that Mainstream young singles and couples are 23% more likely to purchase Tyrrells chips compared to the rest of the population. The Category Manager may want to increase the category’s performance by off-locating some Tyrrells and smaller packs of chips in discretionary space near segments where young singles and couples frequent more often to increase visibilty and impulse behaviour.

over 2 years ago

Case Study: How Does a Bike-Share Navigate Speedy Success?

this is a 12-month dataset to determine how do annual members and casual riders use bike-share program differently in order to design marketing strategies aiming to convert Cyclistic casual riders into annual members.

about 3 years ago

Sign In

RockBen

Ebenezer U

Recently Published