Hotel Recommendation System

Overview

Travelers often struggle to find the ideal hotel that matches their unique preferences and constraints. Whether searching for luxury accommodations, budget-friendly options, or hotels offering specific amenities, existing platforms rarely deliver personalized recommendations. This project develops a robust recommendation system tailored to individual traveler interests and needs.

Data Processing Pipeline

Deep Learning

Filters for actual bookings (is_booking = 1)
Separates user and hotel features
Encodes categorical variables (e.g., continent, country, region, city)
Calculates reservation duration from check-in and check-out dates
Handles missing values
Splits data into training and testing sets

Traditional Approach

Transforms datetime columns into numerical features (e.g., year, month, day)
Applies winsorization to cap outliers using IQR-based thresholds:
- lower_bound = Q1 - 1.5 * IQR
- upper_bound = Q3 + 1.5 * IQR
Replaces sparse null values with column means
Ensures all features are numeric and suitable for model training

Modeling Approaches

Naïve Rule-Based Classifier

Top K hotel recommendations based on popularity
MAP@5: 0.0745 on subset of data

Traditional Modeling Approach

We use an XGBoost classifier for multiclass hotel recommendation, trained with 5-fold stratified cross-validation to maintain class distribution across folds.

Model Configuration

objective="multi:softprob": Multiclass classification with probability outputs
num_class: Automatically set based on number of unique hotel labels
eval_metric="mlogloss": Multi-class log loss as the evaluation metric
max_depth=10: Limits tree depth to reduce overfitting
n_jobs=-1: Enables parallel processing using all available CPU cores

Deep Learning Approach

The deep learning approach uses a dual-tower neural network to generate hotel recommendations. The model maps user and hotel features into a shared 32-dimensional embedding space using fully connected layers with ReLU activation, batch normalization, and L2 normalization. Cosine similarity between user and hotel embeddings is computed to rank hotels. Given a user's search profile, the model retrieves the top K most similar hotels based on embedding similarity.

Model Deployed Here: https://huggingface.co/BazeBai/Towers/tree/main

Previous Efforts

Hotel2Vec Embeddings

Sadeghian et al. developed Hotel2Vec, a neural network architecture that learns hotel embeddings by integrating user clicks, hotel attributes, amenities, and geographic information. This approach effectively tackles the cold-start problem by incorporating diverse data sources.
Paper: Hotel2Vec – Learning Hotel Embeddings

NLP-Based Sentiment Analysis

Aravani et al. proposed a framework utilizing BERT-based models to analyze user reviews, categorizing hotels into "Bad," "Good," or "Excellent" based on sentiment. This method enhances personalized recommendations by understanding user preferences through textual feedback.
Paper: Sentiment Analysis for Hotel Recommendation

Integration of ChatGPT and Persuasive Technologies

Remountakis et al. explored the incorporation of ChatGPT and persuasive techniques into hotel recommender systems. Their approach aims to generate context-aware, personalized suggestions by analyzing user preferences and online reviews.
Paper: ChatGPT in Recommender Systems

Evaluation Metric

The models were primarily evaluated using mean Average Precision (mAP). mAP was used to measure the model's ability to accurately rank hotels by calculating the average precision across all queries, emphasizing both the correctness and the order of relevant results.

Comparison MAP@5:

Naive: 0.0745
Traditional ML: 0.4276
Deep Learning: 0.953

User Interface

Streamlit Web App:
- Interactive web interface for hotel recommendations
- Supports all three recommendation systems approaches
- Input is Random Generated User Profile from training data
- Displays hotel recommendations by rank and respective MAP scores

Setup

./setup.sh

 ⁠

This script takes care of setting up your virtual environment if it does not already exist, activating it, installing requirements, pulling the dataset (if not already present in the data directory), and pre-processing the data.

Running the Streamlit application locally

Assuming your virtual environment is setup and activated, and that the requirements are installed from running setup.sh, you can then run the following to startup a local instance of the Streamlit application.

python streamlit run main.py

 ⁠

Dataset & License

This repository uses the Expedia Hotel Dataset licensed under competition rules defined by Kaggle.

Ethics Statement

This project uses publicly available datasets in compliance with their terms of use. We ensure that all data is handled responsibly, avoiding any misuse, unauthorized distribution, or unethical applications. No personally identifiable information (PII) is collected or used.

Presentation Pitch

View our Pitch HERE.

Streamlit Application

Access our Streamlit application HERE.

ChatGPT was used to help refine and polish this README for better grammar and flow.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
models		models
notebooks		notebooks
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hotel Recommendation System

Overview

Data Processing Pipeline

Deep Learning

Traditional Approach

Modeling Approaches

Naïve Rule-Based Classifier

Traditional Modeling Approach

Model Configuration

Deep Learning Approach

Model Deployed Here: https://huggingface.co/BazeBai/Towers/tree/main

Previous Efforts

Hotel2Vec Embeddings

NLP-Based Sentiment Analysis

Integration of ChatGPT and Persuasive Technologies

Evaluation Metric

Comparison MAP@5:

User Interface

Setup

Running the Streamlit application locally

Dataset & License

Ethics Statement

Presentation Pitch

Streamlit Application

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AIPI540Spring2025Avengers/recommendation_systems

Folders and files

Latest commit

History

Repository files navigation

Hotel Recommendation System

Overview

Data Processing Pipeline

Deep Learning

Traditional Approach

Modeling Approaches

Naïve Rule-Based Classifier

Traditional Modeling Approach

Model Configuration

Deep Learning Approach

Model Deployed Here: https://huggingface.co/BazeBai/Towers/tree/main

Previous Efforts

Hotel2Vec Embeddings

NLP-Based Sentiment Analysis

Integration of ChatGPT and Persuasive Technologies

Evaluation Metric

Comparison MAP@5:

User Interface

Setup

Running the Streamlit application locally

Dataset & License

Ethics Statement

Presentation Pitch

Streamlit Application

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages