gravatar

gspyro

Gabriel Demetrios Lafis

Recently Published

Presentation
Exploratory Data Analysis for Text Prediction Algorithm
Executive Summary This report presents the results of an exploratory data analysis conducted on text data as the foundation for developing a predictive text algorithm. The primary objective was to understand the fundamental characteristics of the text corpus, identify word frequency patterns and n-grams, and establish the groundwork for building an efficient predictive model. The analysis revealed important insights about the structure and distribution of the textual data, including the identification of frequency patterns that follow Zipf’s Law, determination of vocabulary coverage needed for different precision levels, and characterization of the linguistic complexity of the corpus. These results provide clear guidelines for developing the predictive algorithm and subsequent Shiny application.