An interactive Streamlit dashboard to explore renovation permit trends across NYC using NLP, machine learning, and visual analytics.
π Launch Dashboard: Click to open the Streamlit app
Experience real-time filtering, keyword exploration, topic modeling, clustering, and predictive analytics β all in one unified interface.
This project delivers a comprehensive NYC Renovation Trend Analysis System, combining NLP, clustering, machine learning, and a Streamlit-based interactive dashboard for full-cycle analytics.
It utilizes historical renovation permit filings across NYC (2010β2020) to uncover patterns such as:
- π Borough-wise renovation activity and cost trends
- π§ Topic modeling (TF-IDF + NMF) to extract dominant renovation themes
- π KMeans clustering with PCA visualization to classify renovation types
- π Time-series decomposition to detect seasonal patterns
- π€ ML-based predictions of renovation category and estimated cost
- πΈ Exportable visuals and cleaned datasets for further analysis
The system integrates multiple analytical components and interactive tools to deliver actionable insights from raw permit data. Below are the major capabilities offered by the dashboard:
- π Interactive filtering by borough, job type, and time period
- π TF-IDF + NMF-based topic extraction from job descriptions
- π KMeans clustering with PCA-based 2D visualization
- π Word clouds and labeled clusters for clear interpretation
- π Seasonal and yearly renovation trend analysis
- π§ͺ Live predictions using pre-trained ML models (category & cost)
- πΎ Exportable charts and downloadable cleaned data
The dataset used in this project is sourced from the official NYC Open Data platform:
π NYC DOB Permit Issuance Dataset
This dataset contains detailed information about construction permits issued by the New York City Department of Buildings (DOB) from 2010 to 2020. It includes attributes such as:
- Job type and description
- Permit issuance date
- Borough and location
- Estimated cost of work
- Permit type, status, and more
This rich dataset forms the foundation for all NLP, clustering, and cost prediction models used in the analysis.
renovation-trend-analysis/
βββ data/ # Datasets used for analysis
β βββ raw_data/ # Original downloaded NYC permit CSVs
β βββ processed_data/ # Cleaned/transformed datasets
β
βββ models/ # Trained ML model files
β βββ *.pkl # Saved models (e.g., NMF, KMeans, RandomForest)
β
βββ reports/
β βββ images/ # Exported plots, word clouds, visuals
β βββ image1.png
β βββ image2.png
β βββ ...
β
βββ src/ # Source code
β βββ streamlit_app.py # Streamlit dashboard code
β
βββ requirements.txt # Python dependencies
βββ README.md # Project overview and documentation
This project brings together a full spectrum of data science and engineering tools to deliver interactive analytics, machine learning, and NLP in a single deployable app.
- Streamlit: For building interactive web dashboards and visualization UI.
- Scikit-learn: Core ML library used for classification, regression, and clustering (Random Forest, KMeans, GridSearchCV).
- Pipeline:
Pipeline
,make_pipeline
used for chaining preprocessing and modeling. - Model Persistence:
joblib
for saving/loading trained ML models.
- TF-IDF (TfidfVectorizer): Vectorizes job descriptions for topic modeling and ML.
- NMF (Non-negative Matrix Factorization): Topic extraction from TF-IDF features.
- KMeans: For job clustering based on textual features.
- PCA (Principal Component Analysis): For reducing dimensionality and visualizing clusters.
- statsmodels:
seasonal_decompose
used to identify seasonality and trends in renovation permits over time.
- Pandas, NumPy: Core libraries for data cleaning, transformation, and manipulation.
- Regex, IO, OS: Utilities for data handling, parsing, and dynamic path management.
- Matplotlib, Seaborn: Static charts, trend lines, box plots.
- Plotly Express: Interactive bar and line charts in Streamlit.
- WordCloud: To generate word clouds for dominant job themes.
Below are key visuals generated from the analysis β including permit trends, cost distributions, topic models, and clustering insights:
git clone https://github.com/Harish-34/renovation-trend-analysis.git
cd renovation-trend-analysis
python -m venv venv
source venv/bin/activate # For Windows: venv\Scripts\activate
pip install -r requirements.txt
streamlit run src/streamlit_app.py
This project provides multiple actionable outputs that can be directly used or extended:
- π₯ Downloadable Cleaned Datasets: Ready-to-use CSV files for further analysis
- π Exportable Visuals: PNG charts for presentations and reports
- π€ Live Prediction Tool: Instant ML-based predictions of renovation category and cost
The NYC Renovation Trend Analysis project combines NLP, clustering, time-series analysis, and machine learning into a single Streamlit dashboard. It enables both exploratory insights and predictive analytics on historical renovation permit data across NYC (2010β2020).
By integrating end-to-end data engineering with domain-specific visualizations and modeling, this solution showcases the power of real-world applied data science.