Skip to content

Interactive Streamlit dashboard analyzing NYC renovation permits using NLP, clustering, time-series trends, and ML models. Includes keyword extraction, category prediction, PCA plots, and exportable visuals.

Notifications You must be signed in to change notification settings

Harish-34/renovation-trend-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—οΈ NYC Renovation Trend Analysis – Streamlit Dashboard

Streamlit App

An interactive Streamlit dashboard to explore renovation permit trends across NYC using NLP, machine learning, and visual analytics.


πŸš€ Live App

πŸ”— Launch Dashboard: Click to open the Streamlit app

Experience real-time filtering, keyword exploration, topic modeling, clustering, and predictive analytics β€” all in one unified interface.


πŸ“– Project Overview

This project delivers a comprehensive NYC Renovation Trend Analysis System, combining NLP, clustering, machine learning, and a Streamlit-based interactive dashboard for full-cycle analytics.

It utilizes historical renovation permit filings across NYC (2010–2020) to uncover patterns such as:

  • πŸ“Š Borough-wise renovation activity and cost trends
  • 🧠 Topic modeling (TF-IDF + NMF) to extract dominant renovation themes
  • πŸ” KMeans clustering with PCA visualization to classify renovation types
  • πŸ“ˆ Time-series decomposition to detect seasonal patterns
  • πŸ€– ML-based predictions of renovation category and estimated cost
  • πŸ“Έ Exportable visuals and cleaned datasets for further analysis

🧠 Key Features

The system integrates multiple analytical components and interactive tools to deliver actionable insights from raw permit data. Below are the major capabilities offered by the dashboard:

  • πŸ”„ Interactive filtering by borough, job type, and time period
  • πŸ“š TF-IDF + NMF-based topic extraction from job descriptions
  • πŸ“Œ KMeans clustering with PCA-based 2D visualization
  • 🌐 Word clouds and labeled clusters for clear interpretation
  • πŸ“… Seasonal and yearly renovation trend analysis
  • πŸ§ͺ Live predictions using pre-trained ML models (category & cost)
  • πŸ’Ύ Exportable charts and downloadable cleaned data

πŸ“‚ Dataset Source

The dataset used in this project is sourced from the official NYC Open Data platform:

πŸ”— NYC DOB Permit Issuance Dataset

This dataset contains detailed information about construction permits issued by the New York City Department of Buildings (DOB) from 2010 to 2020. It includes attributes such as:

  • Job type and description
  • Permit issuance date
  • Borough and location
  • Estimated cost of work
  • Permit type, status, and more

This rich dataset forms the foundation for all NLP, clustering, and cost prediction models used in the analysis.


πŸ“ Project Structure

renovation-trend-analysis/
β”œβ”€β”€ data/                         # Datasets used for analysis
β”‚   β”œβ”€β”€ raw_data/                # Original downloaded NYC permit CSVs
β”‚   └── processed_data/          # Cleaned/transformed datasets
β”‚
β”œβ”€β”€ models/                      # Trained ML model files
β”‚   └── *.pkl                    # Saved models (e.g., NMF, KMeans, RandomForest)
β”‚
β”œβ”€β”€ reports/
β”‚   └── images/                  # Exported plots, word clouds, visuals
β”‚       β”œβ”€β”€ image1.png
β”‚       β”œβ”€β”€ image2.png
β”‚       └── ...
β”‚
β”œβ”€β”€ src/                         # Source code
β”‚   └── streamlit_app.py         # Streamlit dashboard code
β”‚
β”œβ”€β”€ requirements.txt             # Python dependencies
└── README.md                    # Project overview and documentation

πŸ”§ Tech Stack

This project brings together a full spectrum of data science and engineering tools to deliver interactive analytics, machine learning, and NLP in a single deployable app.

🎯 Frontend

  • Streamlit: For building interactive web dashboards and visualization UI.

🧠 Machine Learning & Modeling

  • Scikit-learn: Core ML library used for classification, regression, and clustering (Random Forest, KMeans, GridSearchCV).
  • Pipeline: Pipeline, make_pipeline used for chaining preprocessing and modeling.
  • Model Persistence: joblib for saving/loading trained ML models.

πŸ—£οΈ Natural Language Processing (NLP)

  • TF-IDF (TfidfVectorizer): Vectorizes job descriptions for topic modeling and ML.
  • NMF (Non-negative Matrix Factorization): Topic extraction from TF-IDF features.

πŸ” Clustering & Dimensionality Reduction

  • KMeans: For job clustering based on textual features.
  • PCA (Principal Component Analysis): For reducing dimensionality and visualizing clusters.

πŸ“ˆ Time Series & Seasonality

  • statsmodels: seasonal_decompose used to identify seasonality and trends in renovation permits over time.

πŸ“Š Data Analysis & Preprocessing

  • Pandas, NumPy: Core libraries for data cleaning, transformation, and manipulation.
  • Regex, IO, OS: Utilities for data handling, parsing, and dynamic path management.

πŸ–ΌοΈ Visualization

  • Matplotlib, Seaborn: Static charts, trend lines, box plots.
  • Plotly Express: Interactive bar and line charts in Streamlit.
  • WordCloud: To generate word clouds for dominant job themes.

πŸ“Έ Visualizations

Below are key visuals generated from the analysis β€” including permit trends, cost distributions, topic models, and clustering insights:

πŸ“ Renovation Jobs by Borough
πŸ“… Time Trends – Monthly Permit Analysis
πŸ’° Avg Initial Renovation Cost by Borough
🧱 Distribution of Job Types (A1/A2/A3)
πŸ“ˆ Cost Trends (2010–2020) by Borough
πŸ“Š Faceted Cost Trend View (w/ Avg)
🌐 Topic 1 Word Cloud
πŸ” PCA Cluster Plot
πŸ“ˆ Cluster Trends Over Time
🧩 Renovation Categories (Clustering)

πŸ› οΈ How to Run Locally

βœ… Clone Repository

git clone https://github.com/Harish-34/renovation-trend-analysis.git
cd renovation-trend-analysis

βœ… Create Virtual Environment

python -m venv venv
source venv/bin/activate  # For Windows: venv\Scripts\activate

βœ… Install Requirements

pip install -r requirements.txt

βœ… Launch Streamlit App

streamlit run src/streamlit_app.py

πŸ“¦ Deliverables

This project provides multiple actionable outputs that can be directly used or extended:

  • πŸ“₯ Downloadable Cleaned Datasets: Ready-to-use CSV files for further analysis
  • πŸ“Š Exportable Visuals: PNG charts for presentations and reports
  • πŸ€– Live Prediction Tool: Instant ML-based predictions of renovation category and cost

🧾 Conclusion

The NYC Renovation Trend Analysis project combines NLP, clustering, time-series analysis, and machine learning into a single Streamlit dashboard. It enables both exploratory insights and predictive analytics on historical renovation permit data across NYC (2010–2020).

By integrating end-to-end data engineering with domain-specific visualizations and modeling, this solution showcases the power of real-world applied data science.


πŸ™‹β€β™€οΈ Author

Harish Chowdary
πŸ’Ό LinkedIn
🌐 Live App

Releases

No releases published

Packages

No packages published