This project develops a content-based recommendation system for New York City Airbnb listings, utilizing Airbnb Open Data, customer reviews, and NYC subway station locations. The system helps users find the most suitable Airbnb listings based on their preferences for neighborhood, price point, and review rating. The recommendation includes the nearest subway station and distance for convenience.
The project incorporates three datasets:
- NYC Airbnb Listings (42,931 observations) – Airbnb listing details such as location, price, and availability.
- NYC Airbnb Reviews (1,110,024 observations) – Customer reviews for sentiment analysis.
- NYC Subway Stations (473 observations) – Subway locations for accessibility recommendations.
The goal is to preprocess these datasets, extract meaningful features, and build a recommendation system.
-
Exploratory Data Analysis (EDA)
- Histograms, scatterplots, and geospatial maps used to visualize distributions and relationships between features.
- Identification of high-priced listings and popular Airbnb neighborhoods.
- Outlier detection and removal.
-
Sentiment Analysis for Review Scores
- Applied RoBERTa (Robustly Optimized BERT Approach), a pre-trained NLP model, to analyze review sentiment.
- Generated compound sentiment scores (scaled 1 to 5) for use in recommendations.
-
Clustering Airbnb Listing Names
- Used TF-IDF and K-Means clustering to group listings based on descriptive text.
- Determined 9 optimal clusters via the Elbow method.
-
Calculating Nearest Subway Station
- Used Geopy’s geodesic distance function to compute distances between each Airbnb and its nearest subway station.
- Stored nearest subway station name and distance (in miles) as features.
- Content-Based Filtering Approach
- Users provide three preferences:
- Neighborhood Group (e.g., Manhattan, Brooklyn)
- Maximum Price
- Minimum Review Rating
- Filters Airbnb listings based on user inputs.
- Uses Cosine Similarity to compare user preferences with listing features.
- Returns top five most relevant listings along with:
- Listing Name
- Neighborhood
- Review Score
- Nearest Subway Station & Distance
- Users provide three preferences:
- Explored Review Score Effectiveness
- Compared RoBERTa-generated sentiment scores against raw reviews.
- Evaluated K-Means Clustering Performance
- Used Elbow method to select the best number of clusters.
- Validated Recommendation Quality
- Ensured recommendations align with user preferences.
- Python (for data processing and modeling).
- Pandas & NumPy (for data handling and transformation).
- Scikit-learn (for TF-IDF, K-Means clustering, and similarity computation).
- NLTK & Transformers (Hugging Face) (for NLP-based review sentiment analysis).
- Geopy (for distance calculations).
- Matplotlib & Seaborn (for visualization).
- Neighborhood: Brooklyn
- Max Price: $150
- Minimum Review Score: 4.0
- Cozy Studio in Williamsburg
- Review Score: 4.7
- Price: $140
- Nearest Subway: Bedford Ave (0.3 miles)
- Bright Apartment in Bushwick
- Review Score: 4.5
- Price: $135
- Nearest Subway: Jefferson St (0.5 miles)
- Quiet Retreat near Prospect Park
- Review Score: 4.6
- Price: $120
- Nearest Subway: Parkside Ave (0.2 miles)
- Budget-Friendly Stay in Greenpoint
- Review Score: 4.3
- Price: $110
- Nearest Subway: Greenpoint Ave (0.4 miles)
- Loft-Style Apartment in DUMBO
- Review Score: 4.8
- Price: $149
- Nearest Subway: York St (0.3 miles)