Skip to content

naoufal51/airbnb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airbnb Data Analysis

This repository contains the code and data for pricing strategy analysis based on Airbnb listings in Los Angeles County. The analysis aims to answer the following questions:

  • What factors contribute to the price of an Airbnb listing?
  • Can we predict the price of a listing given its features?
  • How do the availability and the stay duration impact listing prices?

Check my medium article detailing the analysis

Table of Contents

  1. Overview
  2. Requirements
  3. Usage
  4. Notebooks
  5. Models
  6. Contributing
  7. License

Overview

The main focus of this project is to explore and analyze Airbnb listings data to gain insights into various aspects of the short-term rental market ans specifically pricing. The repository includes a preprocessing pipeline that handles data cleaning, transformation, and feature engineering, making the data suitable for machine learning algorithms or other data-driven tasks.

Requirements

  • Python 3.7 or higher
  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • plotly
  • xgboost

Usage

  1. Clone the repository:
    git clone https://github.com/naoufal51/airbnb.git
  1. Download the Airbnb listings dataset (in CSV format) from Inside Airbnb and place it in the data/raw/airbnb/ directory. For the data used to plot tourist attraction heatmap, you can download it from mygeodata

  2. Install the required Python packages using the requirements.txt file:

    pip install -r requirements.txt

Notebooks

The analysis is broken down into three parts, each corresponding to a question listed above.

  • Question 1: The notebooks data_cleaning.ipynb and data_exploration.ipynb in the question_1 folder perform data cleaning, exploration, and feature engineering to identify the factors contributing to the price of an Airbnb listing.
  • Question 2: The notebook model_price.ipynb in the question_2 folder performs additional feature engineering, trains machine learning models, and evaluates their performance for predicting the price of a listing given its features.
  • Question 3: The notebook data_exploration.ipynb in the question_3 folder analyzes the impact of seasonality, availability and stay duration on listing prices .

Data

The data used in this analysis is stored in the data folder. The raw data is located in the raw subfolder, while the preprocessed data is stored in the processed subfolder.

Models

Trained machine learning models for question 2 are stored in the models/question_2 folder.

License

Check Inside Airbnb and mygeodata for data licenses. As for the code you can use it freely.

Releases

No releases published

Packages

No packages published