gravatar

bjorzech

Brett Orzechowski

Recently Published

DSC609 Final MV Nonprofit/Ensemble Learning
Machine learning models assessing the fiscal health of Mohawk Valley nonprofit organizations. This is part two of two that focuses on Ensemble Learning.
DSC609 Final MV Nonprofit SVM/DT
Machine learning models assessing the fiscal health of Mohawk Valley nonprofit organizations. This is part one of two that focuses on SVMs and Decision Trees.
DSC609 MV Nonprofit Deep Learning Model with Keras
A basic, deep learning model with Keras and created in R. This is created to assess and strengthen the accuracy of nonprofits' fiscal performance in the Mohawk Valley region of upstate New York.
DSC609 MV Nonprofit Fiscal Health Neural Network
A neural network model and analysis of larger tax-exempt organizations in the Mohawk Valley that generate annual revenues of more than $200,000. This classifier model gauges the fiscal health of these organizations.
DSC609 MV Nonprofits: Kernelized SVMs
A holistic assessment of the Mohawk Valley nonprofit ecosystem's fiscal health via supervised classification algorithms -- included kernelized SVMs, decision trees, and k-NN.
DSC609 MV Nonprofit Fiscal Health Regularization
These data was extracted from Internal Revenue Service Form 990, which some tax-exempt organizations are required to submit as part of their annual reporting. In the Mohawk Valley, there are 328 tax-exempt organizations with annual revenues of more than $200,000, therefore, they must file a 990. These data offer a snapshot of the 100 highest-grossing nonprofits between Oneida and Herkimer counties in upstate New York. We will explore these data throughout the term and eventually use the full data set. Although there is longitudinal data available, for this deliverable we will focus on the last full reporting year of 2018.
DSC607: K-means - Bifurcation on New York State Startups
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For more strategic investment based on geography, are there clustering patterns that offer insight into the potential of shared characteristics, and as a result, eventual success? Initially, models were proposed to focus on region, but after examining additional unsupervised algorithms, namely K-means and hierarchical structuring, this focus changed after the initial submission of the project overview. Even though a variable includes “region” there is also “city,” which will not be included in early models. This is important to note because conventional wisdom dictates that five or six MSAs attract the most company and capital (New York, Silicon Valley, Boston, Seattle, L.A. and/or Austin). There are no expectations and the results will most likely not show a considerable difference, but outliers may exist. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters. To do this, we will use K-means for this model for Upstate New York and then New York City-based companies.
DSC607: KNN - Bifurcation on New York State Startups Upstate
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For classification purposes, and to understand whether certain key characteristics — or variables — associated with traditional investing lend themselves to strengthening a company’s success, the choice was made to use nearest neighbor instead of a decision tree. K-NN can be used to determine the class and label while the approach is more flexible to find all the training examples that are relatively similar to the attributes of the test instances (Tan et al., 2019, p. 208). Originally, the idea was to use a decision tree to gauge whether changes in IPO or acquisition would matter, but in terms of investors preparing for a next round of investment, this method may offer more insight through testing. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters. To do this, we will use k-NN for this model for Upstate New York companies while another R Markdown document will analyze New York City.
DSC607: KNN - Bifurcation on New York State Startups NYC
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For classification purposes, and to understand whether certain key characteristics — or variables — associated with traditional investing lend themselves to strengthening a company’s success, the choice was made to use nearest neighbor instead of a decision tree. K-NN can be used to determine the class and label while the approach is more flexible to find all the training examples that are relatively similar to the attributes of the test instances (Tan et al., 2019, p. 208). Originally, the idea was to use a decision tree to gauge whether changes in IPO or acquisition would matter, but in terms of investors preparing for a next round of investment, this method may offer more insight through testing.
DSC607: Logistic - Bifurcation on New York State Startups
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. The choice was made to begin with logistic regression to understand whether each subset of the data (New York City and Upstate) proved to be more favorable to startup companies in terms of probability of success. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters.
DSC607 Upstate Startup Clustering
An analysis of the Upstate New York startup ecosystem via two unsupervised learning algorithms – K-means and principal component analysis (PCA).
DSC607 Upstate Acquired/IPO DT and KNN
For this exploration, I wanted to examine the probability of either an acquisition or IPO based on a number of variables for classification purposes – level of funding, number of investors, number of investment rounds, and whether the company is still operating and accepting investment or has been acquired or exercised an IPO.
DSC607: Module One
An overview of DSC607 (Data Mining): Module One