gravatar

bjorzech

Brett Orzechowski

Recently Published

DSC607: K-means - Bifurcation on New York State Startups
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For more strategic investment based on geography, are there clustering patterns that offer insight into the potential of shared characteristics, and as a result, eventual success? Initially, models were proposed to focus on region, but after examining additional unsupervised algorithms, namely K-means and hierarchical structuring, this focus changed after the initial submission of the project overview. Even though a variable includes “region” there is also “city,” which will not be included in early models. This is important to note because conventional wisdom dictates that five or six MSAs attract the most company and capital (New York, Silicon Valley, Boston, Seattle, L.A. and/or Austin). There are no expectations and the results will most likely not show a considerable difference, but outliers may exist. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters. To do this, we will use K-means for this model for Upstate New York and then New York City-based companies.
DSC607: KNN - Bifurcation on New York State Startups Upstate
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For classification purposes, and to understand whether certain key characteristics — or variables — associated with traditional investing lend themselves to strengthening a company’s success, the choice was made to use nearest neighbor instead of a decision tree. K-NN can be used to determine the class and label while the approach is more flexible to find all the training examples that are relatively similar to the attributes of the test instances (Tan et al., 2019, p. 208). Originally, the idea was to use a decision tree to gauge whether changes in IPO or acquisition would matter, but in terms of investors preparing for a next round of investment, this method may offer more insight through testing. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters. To do this, we will use k-NN for this model for Upstate New York companies while another R Markdown document will analyze New York City.
DSC607: KNN - Bifurcation on New York State Startups NYC
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. For classification purposes, and to understand whether certain key characteristics — or variables — associated with traditional investing lend themselves to strengthening a company’s success, the choice was made to use nearest neighbor instead of a decision tree. K-NN can be used to determine the class and label while the approach is more flexible to find all the training examples that are relatively similar to the attributes of the test instances (Tan et al., 2019, p. 208). Originally, the idea was to use a decision tree to gauge whether changes in IPO or acquisition would matter, but in terms of investors preparing for a next round of investment, this method may offer more insight through testing.
DSC607: Logistic - Bifurcation on New York State Startups
Where should a new company start, but more importantly, from a data-driven decision standpoint, is the likelihood of reaping the success of an initial public offering (IPO) or overall acquisition stronger in New York City or any points Upstate New York? The hypothesis states that location in New York State and other key variables do not make a considerable difference in terms of predicting the success of a new company. The choice was made to begin with logistic regression to understand whether each subset of the data (New York City and Upstate) proved to be more favorable to startup companies in terms of probability of success. By running different models — instead of one combined — the results may be more conclusive and a declarative statement can be made whether location really matters.
DSC607 Upstate Startup Clustering
An analysis of the Upstate New York startup ecosystem via two unsupervised learning algorithms – K-means and principal component analysis (PCA).
DSC607 Upstate Acquired/IPO DT and KNN
For this exploration, I wanted to examine the probability of either an acquisition or IPO based on a number of variables for classification purposes – level of funding, number of investors, number of investment rounds, and whether the company is still operating and accepting investment or has been acquired or exercised an IPO.
DSC607: Module One
An overview of DSC607 (Data Mining): Module One