The report describes making predictions using machine-learning while comparing these methods: A. Ordinary-least-squares B. Partial-least-squares (PLS) C. Pre-processing with Principal-Components-Analysis followed by PLS D. Ridge regression Overview The Farmer's Almanac advises about "A cold and wet June" and it might seem foolhardy to improve upon a maxim like this one or try to predict anything about weather, but this investigation nevertheless considers whether anything might be gained if we apply machine-learning or artificial intelligence to weather records from O'Hare Airport at Chicago, Illinois covering the years from 1960 to 2018. This analysis certainly shows the difficulties in weather prediction, but perhaps it shows that the likelihood or probability of a wet June appears be weakly correlated with snow in February and cold weather in early spring and other variables over the previous months of a yearly weather record. We develop models for four algorithms that predict the amount of June rain and compare the mean-squared-error for these models using cross-validation. Finally, we perform final tests for which the bias, variance, and mean-square-error is presented for each method. The low correlation of June rain with February snow and cold weather in March and April changes the probability of a dry June a small but statistically significant amount, particularly for wet cluster years with predicted rain level over 100-mm. The years in the dry cluster seem to have a more or less constant rain level, independent of the amount of predicted rainfall. None of the dry cluster years have June rain predictions above 100-mm while the three wettest years have predictions above 100-mm as shown in the exploratory plot of Figure-6. The reproducible code for this project and report is shared at this address: https://github.com/marcelMerchat/dry_june_in_illinois # Raw Data The raw weather data is for Station USW00094846 at O’Hare Airport at Chicago, Illinois covering the years from 1960 to 2018. The raw data was processed to make a data frame of year records that also including the level of Lakes Huron-Michigan and simulated solar irradiation levels. The raw data for O’Hare Airport was automatically downloaded following the application programming interface (API) at the National Centers for Environmental Information (NCEI) for the United States Government. The following query parameters were appended to the internet address to fetch the data: stations=USW00094846, startDate=1958-01-01, endDate=2019-07-01, format=csv The download is saved as the file named ChicagoWeather2018.csv. ## Numerical Data Fields These fields were selected for analysis: PRCP, Precipitation (tenths of mm) SNOW, Snowfall (mm) SNWD, Snow depth (mm) TMAX, Maximum temperature in tenths of degrees (°C) TMIN, Minimum temperature in tenths of degrees (°C) AWND, Average daily wind speed (tenths of meters per second) WDF2, Direction of fastest 2-minute wind (degrees) WSF2, Fastest 2-minute wind speed (meters per second) WSF5, Gust intensity as fastest 5-second wind speed (meters per second) WT01, Fog, ice fog, or freezing fog (may include heavy fog) WT03, Thunder WT05, Hail WT08, Smoke or haze WT09, Blowing or drifting snow WT11, High or damaging winds WT13, Mist WT16, Rain (may include freezing rain, drizzle, and freezing drizzle) WT17, Freezing rain WT18, Snow, snow pellets, snow grains, or ice crystals ## Other Meteorological Data Although we only use the above weather data from Station USW00094846 at O’Hare Airport for our final predictions, we also explored if the level of Lakes Huron-Michigan or solar irradiation levels were correlated with June rain, but this other meteorological data was eliminated from model-building and prediction as the airport weather data provides most of the predictive power. ## Lake Huron-Michigan Water level The Army Corp of Engineering considers Lakes Huron and Michigan as a single body of water with the same average water level. The monthly mean water level for Station 9075014 at Harbor Beach, MI and Station 9087044 at Calumet Harbor in Illinois was automatically downloaded from the NOAA government website with query parameters to select to select the years beginning with the 1958-1959 snow year and ending with the 2018-2019 snow year where a snow year begins on July 1 and ends the following calendar year on June 30. The monthly mean water level for Station 9087044 at Calumet Harbor was used for the analysis. As only airport weather data was selected for the final models, the dimension of the solution was reduced by elimination of lake level variables from the model. However, the lake data is still part of the data frame and file of all variables. ## Simulated Solar Irradiation Data Solar data was not selected for the final models despite having some low correlation with June rain as the weather variables in the airport data have higher correlation. Since it might be unwarranted to discard the perturbation of the earth’s orbit and the solar irradiation levels for the earth caused by the gravitational pull of Jupiter without understanding how insignificant this might be, the solar data requires further study. The description of the simulated solar irradiation data is included in the Appendix at Section-10.6 of the report. The code for generating the raw solar data is contained in the file named SunJupiterEarthSimulation.R. The raw data was processed to produce monthly and year averages as described below under preparing year records.
This report concerns SAIDA and SAIFI data for more than five hundred electric utility companies that reported to the United States Department of Energy (DOE) according to the IEEE-1366 standard in the three years from 2013-2015. We also discuss the event time data found in Electric Disturbance reports for the years 2011-2016 which are filed on Form OE-417 for larger power disruptions. We investigate the reliability data which was uniformly reported beginning in 2013 and the detailed event data which became available starting in 2011.
This web page was made using knitr with R-tools. Its success depends on the acs package, Version 2.1.1, dated July 22, 2017 by Ezra Haber Glenn. This version is running on Microsoft Windows with https website downloads. The page describes census bureau income data for Lancaster County, Pennsylvania. The details of the R Markdown program are in this Github repository: https://github.com/marcelMerchat/census_bureau_map_income
This internet slide show describes planning a LabVIEW project.
This is an internet slide show that discusses planning a LabVIEW Project.
This report describes predicting product failure rates using the Weibull probability distribution and accelerated temperature tests based on the Arrhenius equation.
The document describes an R program that queries and manages an Oracle MySQL database.
This is a reproducible report describing a breast cancer prediction algorithm based on the University of Wisconsin 2016 dataset.
This webpage describes a cancer prediction formula from a 2016 University of Wisconsin the breast cancer data set.
This slideshow is a modified version of the slides for the Capstone Project.
This is a amended version of the slide show used for the Capstone Project.
This is the Slidify version of the Rpres slide show.
This is a slidify version of the slide show for my Coursera-John Hopkins Data Science Capstone Project. It was originally submitted as an Rpres document.
This is a slide show that demonstrates a text prediction algorithm for the Capstone Project of the Coursera John Hopkins Data Science Program.
This is a slide show that discusses a text prediction algorithm for the Coursera John Hopkins Data Science Program.
This is a side show demonstration for the Capstone Project for text prediction.
This is an Rstudio Presentations slide show that complements the class project for the Data Products Class for the Cousera John Hopkins Data Science program.
This is an HTML slide show that complements the interactive website that was submitted for the Class Project for the Data Products Class of the Coursera John Hopkins Data Science program.
This is an html Rstudio Presentations slide show that describes the data product study that was prepared for the class project.
An Analysis of Weather Analysis for Emergency Weather Management A Coursera Class Assignment