EcoSort: Waste Management Summative
Overview This project delivers a data-driven exploration of waste management dynamics through integrated datasets—including policy documents and waste category descriptions—with scalable preprocessing using PySpark for efficient, large-scale analysis.
Highlights PySpark-based data ingestion for scalable handling of diverse waste datasets.
Integration of descriptive taxonomies (waste_descriptions.csv) and regulatory context (waste_policy_documents.json) for richer analysis.
Analytical workflows in Jupyter Notebook (waste_management_summative.ipynb) to derive actionable insights into waste streams, policies, and sustainability themes.
Purpose & Value Ideal for demonstrating:
Large-scale data processing proficiency (PySpark)
Real-world dataset integration and exploratory workflow design
Strong foundation in environmental informatics relevant to policy and urban planning roles
Technologies PySpark – Scalable data handling
Pandas, NumPy – Data manipulation and analysis
Jupyter Notebook – Narrative-driven workflow and visualization