Recently Published
ANALISIS MAKROEKONOMI LAPORAN PKL
Menganalisis kondisi makroekonomi di masa mendatang pada tingkat regional, serta merumuskan strategi yang tepat dalam menghadapi dinamika ekonomi tersebut. Analisis dilakukan dengan menggunakan metode Vector Autoregression (VAR) berdasarkan data triwulan periode 2011–2025
Insurance Data Analysis
This project demonstrates the transformation of the Insurance dataset from wide format to long (tidy) format using R's tidyverse package. The Insurance dataset contains health insurance information for 1,338 individuals, including demographic characteristics and healthcare charges.
The primary objectives of this data transformation project are to:
- Restructure the data from wide format (where multiple measurements exist as separate columns) to long format (where each measurement becomes its own row)
- Apply tidy data principles to make the dataset more suitable for statistical analysis and visualization
- Demonstrate best practices in data wrangling and preparation for data science workflows
- Fulfill DATA 624 course requirements by showcasing proficiency in data transformation techniques essential for masters-level data analysis
Estadística de Primaria
Este es un currículo unificado de Estadística para Primaria
Kiểm định tỷ lệ một tổng thể (Binomial Test)
Nghiên cứu này trình bày ứng dụng của Kiểm định Nhị thức (Binomial Test), một phương pháp thống kê chính xác, để kiểm định giả thuyết về tỷ lệ một tổng thể. Sử dụng công cụ R/RStudio, bài viết phân tích bộ dữ liệu thực tế về thói quen sử dụng mạng xã hội của học sinh và sinh viên. Mục tiêu là rút ra các kết luận có ý nghĩa thống kê về các vấn đề như tỷ lệ nghiện mạng xã hội và tác động tiêu cực của nó đến kết quả học tập.
H517 Research Paper Summary
My summary of a 2019 research paper on how visualization design can affect how users perceive causality in correlated variables
Energy Usage Analysis System
This analysis uses the Energy Usage Analysis System dataset, which tracks energy consumption across government facilities. It focuses on electricity, natural gas, fuel oil, chilled water, and steam usage from 1973 to 2019, as published by the U.S. General Services Administration
Tidying Wide Datasets to produce Long Datasets
Tidying wide datasets involves transforming data from a format where multiple measurements are spread across separate columns into a long format where each row represents a single observation. In wide format, each subject or entity occupies one row with many columns representing different variables or time periods, which can make filtering, grouping, and visualization challenging.
The transformation process uses functions like `pivot_longer()` in R or `melt()` in Python to collapse multiple measurement columns into two key columns: one identifying the type of measurement and another containing the actual value. This restructuring follows tidy data principles where each variable forms a column, each observation forms a row, and each type of observational unit forms a table, making the data more suitable for statistical analysis and machine learning algorithms.
The result is a dataset with more rows but fewer columns that is easier to filter by measurement type, create visualizations with, and analyze using modern data science tools.
Animated Suppression Gap (ΔX) Map: Revealing Hidden Suicide Burden Across Reporting Ranges
This interactive map visualizes how suicide data visibility changes as reporting ranges expand.
Each frame shows the ΔX (percentage-point drop in data suppression) between consecutive range lengths.
Red regions represent states where suicide data become visible only with multi-year aggregation: highlighting areas of chronic, low-frequency suicide burden that remain masked in annual reporting. This set includes only the beta test range data, and does not include rolling range data.