This repository contains the code and data for pricing strategy analysis based on Airbnb listings in Los Angeles County. The analysis aims to answer the following questions:
- What factors contribute to the price of an Airbnb listing?
- Can we predict the price of a listing given its features?
- How do the availability and the stay duration impact listing prices?
Check my medium article detailing the analysis
- Overview
- Requirements
- Usage
- Notebooks
- Models
- Contributing
- License
The main focus of this project is to explore and analyze Airbnb listings data to gain insights into various aspects of the short-term rental market ans specifically pricing. The repository includes a preprocessing pipeline that handles data cleaning, transformation, and feature engineering, making the data suitable for machine learning algorithms or other data-driven tasks.
- Python 3.7 or higher
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- plotly
- xgboost
- Clone the repository:
git clone https://github.com/naoufal51/airbnb.git
-
Download the Airbnb listings dataset (in CSV format) from Inside Airbnb and place it in the data/raw/airbnb/ directory. For the data used to plot tourist attraction heatmap, you can download it from mygeodata
-
Install the required Python packages using the requirements.txt file:
pip install -r requirements.txt
The analysis is broken down into three parts, each corresponding to a question listed above.
- Question 1: The notebooks data_cleaning.ipynb and data_exploration.ipynb in the question_1 folder perform data cleaning, exploration, and feature engineering to identify the factors contributing to the price of an Airbnb listing.
- Question 2: The notebook model_price.ipynb in the question_2 folder performs additional feature engineering, trains machine learning models, and evaluates their performance for predicting the price of a listing given its features.
- Question 3: The notebook data_exploration.ipynb in the question_3 folder analyzes the impact of seasonality, availability and stay duration on listing prices .
The data used in this analysis is stored in the data folder. The raw data is located in the raw subfolder, while the preprocessed data is stored in the processed subfolder.
Trained machine learning models for question 2 are stored in the models/question_2 folder.
Check Inside Airbnb and mygeodata for data licenses. As for the code you can use it freely.