gravatar

RockBen

Ebenezer U

Recently Published

Document"Data Uniformity: A Statistic Assessment and management of Outlier and Missing Values In R "
The uniformity of a dataset helps the analyst to get an accurate result or an higher accuracy; two major issues to accuracy are from outliers and missing values not handled well. Thus, pre-processing of your data value is the crucial point of any analysis and the focal point of any analyst whose interest is getting accurate insight from the dataset.
Document"Dummy Variables
Analyzing President Bola Ahmed Tinubu's 29 MAY 2023 Inaugural Address
Based on the sentiment scores provided for the political address, we can observe the following sentiments and their corresponding counts and percentages: Anger: 23 occurrences, accounting for approximately 3.87% of the sentiment score. Anticipation: 58 occurrences, accounting for approximately 9.76% of the sentiment score. Disgust: 14 occurrences, accounting for approximately 2.36% of the sentiment score. Fear: 43 occurrences, accounting for approximately 7.24% of the sentiment score. Joy: 63 occurrences, accounting for approximately 10.61% of the sentiment score.
Analyzing President Bola Ahmed Tinubu’s 29 MAY 2023 Inaugural Address
when the list output is true it tells us we have the key word, when it is false, the key word is missing. the second code tells us the number of matches we got in there. the third code tells us where the key words where mention and the last code help us to see the surrounding text around the key words
ETL, EDA, and Control Tests Notebook
course
DocumentMultiple Regression
Recall that our broad business question is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?” Up to this point we have analyzed the ability of predictor variables (aka explanatory variables) to create forecasts of quarterly revenue independently of each other. In other words, we have used one predictor variable in a regression model. This is known as simple regression. In this lesson we will investigate the benefits of using multiple variables in the same linear model to create those forecasts. When we do that it’s known as multiple regression.
Residual Analysis:
: Introduction This dataset is An Adidas sales dataset that have information on the sales of Adidas products, number of units sold, the total sales revenue, the location of the sales, the sales outlets and method of sales.
Residuals and Predictions
Residuals and Predictions Recall that our broad business question is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?” Creating a model to help answer this question can certainly be helpful for predicting future performance. Another way in which the model can be used for business purposes is to evaluate past performance.
Simple Regression
Linear models can be very effective tools for forecasting a business’s performance. Visually fitting a line to a scatter plot is effective, but it has two main
AN R PROJECT ON AFRICA CONTINENT COVID-19 DATA ANALYSIS AS OF JANUARY 2020 TO MAY 2023" Author: ebenezer akpati
the analysis suggests a strong linear relationship between the variable "total_cases" and the outcome variable. The coefficient estimate for "total_cases" is positive (0.01227), indicating that an increase in "total_cases" is associated with an increase in the outcome variable. The model has a good fit, as indicated by the high R-squared value and the overall significance of the model.
Simple Regression
Simple Regression Linear models can be very effective tools for forecasting a business’s performance. Visually fitting a line to a scatter plot is effective, but it has two main drawbacks: first, it’s subjective.
"Correlation"
In this lesson we will explore the concept of correlation, how to calculate correlations in R, and how correlations can be used to provide insight about the relationship between two columns of data. At this point you should have an understanding of the Teca regression data (tecaRegressionData.rds). We will use this data to provide insight to our business question, which is, “How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?”
t"Regression Data"
In this lesson you will be introduced to the data that will be used to answer this question which is, "How are quarterly sales affected by quarter of the year, region, and by product category (parent name)?"
DocumentMORE ON FUNCTIONS: ARGUMENTS, CREATING, PRINTING, SAVING RESULTS, RETURNING RESULTS
Don’t repeat yourself. One reason why you should not repeat code is because
Joining Data
Some useful insights can be gained when one dataset is analyzed in the context of another dataset. For instance, if weather is expected to have an influence on sales, then it may be worth combining the weather measurements to the point-of-sale data.
Stacking and Sorting Data
Often times large datasets are divided into smaller dataframes and stored in separate files. For instance, point of sale data may be stored in such a way that there’s a separate file for each month. Alternatively, subsets of the data are extracted from a large database in smaller sections. In either of these situations, the rows from the different dataframes need to be stacked together to form a single dataframe. You might call this a vertical stack because it makes the dataframe longer.
Data Aggregation and Summary
Data Aggregation and Summary This lesson introduces two functions from the dplyr package for aggregating data: the group_by() function and the summarise() function. We will also review how to use the lubridate package for converting strings to datetime types, as well as for rounding datetime values to date values. Finally, we will introduce the n_distinct() function for calculating the distinct number of values for different groups.
Pivoting Dataframes Between Wide and Long Shapes
Pivoting Dataframes Between Wide and Long Shapes This lesson introduces two functions from the tidyr package for pivoting dataframes between wide and long formats. The tidyr package is part of the tidyverse, and it has functions for reshaping dataframes. The shape of a dataframe refers to the number of rows and columns. Many plotting functions and dashboard applications work best with long dataframes that have few columns and many rows. In contrast, many algorithms and human readable tables work best with wide dataframes that have few rows and many columns.
Handling Missing Values
Handling Missing Values This lesson introduces some ways to deal with missing values. It’s important that missing values are either removed or filled in with imputed values so that algorithms do not throw errors.
Using dplyr's Mutate, Rename, Relocate, and Distinct Functions
This lesson focuses on four functions that simplify some common data preprocessing tasks, `mutate()`, `rename()`, `relocate()`, and `distinct()`. There are many other functions in the dplyr package for wrangling data. You should spend some time reviewing them when you want to perform a specific task.
Subsetting Data Using Filter and Select Functions
This lesson will illustrate how to use the tidyverse grammar for (1) reducing the length of the dataframe to specific rows using the filter function in the dplyr package, and (2) reducing the width of the dataframe to specific columns using the select function in the dplyr package.
Useful Operators: %>% and %in%
In this lesson you'll learn about two useful operators. The pipe operator, %>%, allows you to chain functions together. The %in% operator allows you to evaluate if a value is in a vector of values.
Review of Notebooks and Introduction to dplyr
a brief on understanding basic functions of dplyr
Quantity Virtual Internship - Retail Strategy and Analytics - Task 1
Mainstream, midage and young singles and couples are also more likely to pay more per packet of chips. This is indicative of impulse buying behaviour. We’ve also found that Mainstream young singles and couples are 23% more likely to purchase Tyrrells chips compared to the rest of the population. The Category Manager may want to increase the category’s performance by off-locating some Tyrrells and smaller packs of chips in discretionary space near segments where young singles and couples frequent more often to increase visibilty and impulse behaviour.
Case Study: How Does a Bike-Share Navigate Speedy Success?
this is a 12-month dataset to determine how do annual members and casual riders use bike-share program differently in order to design marketing strategies aiming to convert Cyclistic casual riders into annual members.