I'm a Data Product Manager at a renewable energy development company in NYC, passionate about leveraging coding and data analysis to drive insights and innovation in the energy sector and beyond. This repository showcases my coding skills in SQL, Python, and R, with a focus on geospatial analysis and data engineering.
Description: This project involves the extraction of market supply charge rates from a large number of PDF documents provided by an energy utility company. The PDF documents, which are accessible via hyperlinks on the Con Edison website, contain unstructured data in the form of tables and text. Using Python and the pdfplumber library, I developed a script to automate the extraction of relevant information from the PDFs. The script reads the PDF content, identifies the sections of interest using pattern matching techniques, and applies data extraction methods to capture the market supply charge rates. As the data is unstructured, additional preprocessing and cleaning steps are performed to transform it into structured format, enabling further analysis, reporting, and integration with other systems. This project showcases my ability to work with unstructured data and utilize PDF parsing techniques to extract valuable insights related to market supply charge rates in the energy sector.
Description: This project involves the comparison and analysis of parcel data for potential projects. The script reads two CSV files: the original table containing a list of parcels for potential projects and a new table after refining the query parameters. The script performs data merging and comparison operations to identify the differences between the two datasets. It removes a specified column from both tables and merges them based on common columns. The resulting merged dataset highlights the parcels from the original table, indicating that they have potentially been previously reviewed. The script then generates an Excel file where the highlighted parcels are visually differentiated, allowing the receiver of the table to easily identify the parcels they have potentially already examined and observe the changes made with the new parameters. This project showcases my skills in data manipulation, merging, and generating visual outputs for effective data analysis and presentation in the context of parcel data for potential projects.
Description: This script uses pandas in Python to perform data manipulation and transformation on a CSV file containing network data for energy distribution. It selects specific columns, sorts the data, creates a new column based on conditional logic, and replaces values in the dataset. The resulting modified dataset is saved as "networks.csv". This script demonstrates my skills in leveraging pandas for efficient data processing and manipulation in the energy sector.
Description: In this project, we explore geodemography through the lens of New York City to uncover the unique characteristics and socio-economic landscapes of different NYC neighborhoods.Starting with the demographic data median income and total population, tidycensus package is used to fetch information from the US Census Bureau. I calculate population density, an essential factor in understanding neighborhood dynamics, and clean the data. Leveraging the power of K-means clustering and the elbow method, I determine the optimal number of clusters, revealing the natural divisions that define the city's neighborhoods. Interactive maps are generated to showcase the spatial distribution of these clusters across the city. Scatter plots are brought to life to dissect the relationships between demographic variables, providing a deeper understanding of the socio-economic fabric within each cluster.By exporting the clustered data into a GeoJSON format, I open doors to additonal spatial analysis in other programs such as QGIS. This project showcases my ability to combine data science, geospatial analysis, and visualization to unravel the intricate tapestry of NYC's neighborhoods.