This project aims to predict Airbnb rental prices in New York City using a comprehensive data analysis strategy and advanced machine learning techniques. We start with an extensive exploratory data analysis (EDA) of a dataset containing 39,202 Airbnb listings, which includes a wide range of features from basic property details to detailed host characteristics and textual descriptions. Refer to the full report at Airbnb Price Prediction: Leveraging Sentiment Analysis and NER Enhanced Regression Model.pdf
We address missing values and engineer new features, such as sentiment scores derived from natural language processing (NLP) techniques. Specifically, we use sentiment analysis on textual descriptions and named entity recognition to quantify amenities and location references.
To enhance our predictive capabilities, we experiment with several machine learning models, including Lasso Regression, Support Vector Regression (SVR), and CatBoost Regression. These models are trained and evaluated based on Root Mean Squared Error (RMSE) and R-squared (R²) metrics.