Predicting Hotel Cancellations and ADR with Machine Learning

The purpose of this project is to predict hotel cancellations and ADR (average daily rate) values for two separate Portuguese hotels (H1 and H2). Included in the GitHub repository are the datasets and notebooks for all models run.

The original datasets and research by Antonio et al. can be found here: Hotel Booking Demand Datasets (2019). All other relevant references have been cited in the below articles.

Articles

Forecasting Hotel Revenue: Predicting ADR Fluctuations with ARIMA

Average daily rate represents the average rate per day paid by a staying customer at a hotel. This is an important metric for a hotel, as it represents the overall profitability of each customer. In this example, auto_arima is used in Python to forecast the average daily rate over time for a hotel chain.

Handling Imbalanced Classification Data: Predicting Hotel Cancellations Using Support Vector Machines

When attempting to build a classification algorithm, one must often contend with the issue of an unbalanced dataset. An unbalanced dataset is one where there is an unequal sample size between classes, which induces significant bias into the predictions of the classifier in question. This example illustrates the use of a Support Vector Machine to classify hotel booking customers in terms of cancellation risk.

Regression-based neural networks: Predicting Average Daily Rates for Hotels

When it comes to hotel bookings, average daily rate (ADR) is a particularly important metric. This reflects the average rate per day that a particular customer pays throughout their stay. In this particular example, a neural network is built in Keras to solve a regression problem, i.e. one where our dependent variable (y) is in interval format and we are trying to predict the quantity of y with as much accuracy as possible.

Project Stages

Stage 1: Data Manipulation and Feature Selection

Used pandas to collate over 115,000 individual cancellation and ADR entries into a weekly time series format.
Identified lead time, country of origin, market segment, deposit type, customer type, required car parking spaces, and week of arrival as the most important features in explaining the variation in hotel cancellations.

Stage 2: Classification

Trained classification models on the H1 dataset and tested against the H2 dataset. Used boto3 and botocore to import data from AWS S3 bucket to SageMaker.
Used the Explainable Boosting Classifier by InterpretML, KNN, Naive Bayes, Support Vector Machines, and XGBoost to predict cancellations across the test set.
SVM demonstrated the best performance overall with an f1-score accuracy of 71%, and 66% recall across the cancellation class.
An ANN model was also trained in conjunction with dice_ml to identify Diverse Counterfactual Explanations for hotel bookings, i.e. changes in feature parameters that would cause a non-canceling customer to cancel, and vice versa.

Stage 3: Regression

Used regression modelling to predict ADR (average daily rate) across each customer.
Trained regression models on the H1 dataset and tested against the H2 dataset.
Regression-based neural network with elu activation function showed the best performance, with a mean absolute error of 29 compared to the mean ADR of 105 across the test set.

Stage 4: Time Series

Used ARIMA and LSTM models to forecast weekly ADR trends.
ARIMA demonstrated best results in forecasting ADR for H1 (RMSE of 10 relative to mean ADR of 160) and H2 (RMSE of 8 relative to mean ADR of 131).

Name		Name	Last commit message	Last commit date
Latest commit History 407 Commits
.ipynb_checkpoints		.ipynb_checkpoints
my_checkpoint.h6		my_checkpoint.h6
.RData		.RData
.Rhistory		.Rhistory
.sync.ffs_db		.sync.ffs_db
H1.csv		H1.csv
H1_bnlearn.csv		H1_bnlearn.csv
H1_random_sampling.csv		H1_random_sampling.csv
H2.csv		H2.csv
README.md		README.md
_config.yml		_config.yml
bnlearn_hotel.ipynb		bnlearn_hotel.ipynb
bonferroni_leadtime.R		bonferroni_leadtime.R
classification-xgboost-hotels.R		classification-xgboost-hotels.R
classification.ipynb		classification.ipynb
classification_pca_feature_selection.ipynb		classification_pca_feature_selection.ipynb
daily cancellations h1.csv		daily cancellations h1.csv
df1.csv		df1.csv
geopandas_adrprice.ipynb		geopandas_adrprice.ipynb
h1predictions.csv		h1predictions.csv
h2predictions.csv		h2predictions.csv
hotel-adr-7.png		hotel-adr-7.png
interpretml-classification.ipynb		interpretml-classification.ipynb
interpretml-dice-ml.ipynb		interpretml-dice-ml.ipynb
linear_regression.ipynb		linear_regression.ipynb
my_checkpoint.h5		my_checkpoint.h5
regression-2.png		regression-2.png
regression-nn-elu.ipynb		regression-nn-elu.ipynb
regression-nn-relu-modified.ipynb		regression-nn-relu-modified.ipynb
regression-nn-relu.ipynb		regression-nn-relu.ipynb
spark-h1.ipynb		spark-h1.ipynb
spark-h2.ipynb		spark-h2.ipynb
svm-2.png		svm-2.png
svm_10f.ipynb		svm_10f.ipynb
timeseries-arima-adr-h1.ipynb		timeseries-arima-adr-h1.ipynb
timeseries-arima-adr-h2.ipynb		timeseries-arima-adr-h2.ipynb
timeseries-cnn-cancellations-daily-h1.ipynb		timeseries-cnn-cancellations-daily-h1.ipynb
timeseries-lstm-adr-h1.ipynb		timeseries-lstm-adr-h1.ipynb
timeseries-lstm-adr-h2.ipynb		timeseries-lstm-adr-h2.ipynb
timeseries-lstm-cancellations-h1.ipynb		timeseries-lstm-cancellations-h1.ipynb
timeseries-lstm-cancellations-h2.ipynb		timeseries-lstm-cancellations-h2.ipynb
weekly adr h1.csv		weekly adr h1.csv
weekly adr h2.csv		weekly adr h2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Hotel Cancellations and ADR with Machine Learning

Articles

Forecasting Hotel Revenue: Predicting ADR Fluctuations with ARIMA

Handling Imbalanced Classification Data: Predicting Hotel Cancellations Using Support Vector Machines

Regression-based neural networks: Predicting Average Daily Rates for Hotels

Project Stages

Stage 1: Data Manipulation and Feature Selection

Stage 2: Classification

Stage 3: Regression

Stage 4: Time Series

About

Uh oh!

Releases

Packages

Languages

MGCodesandStats/hotel-modelling

Folders and files

Latest commit

History

Repository files navigation

Predicting Hotel Cancellations and ADR with Machine Learning

Articles

Forecasting Hotel Revenue: Predicting ADR Fluctuations with ARIMA

Handling Imbalanced Classification Data: Predicting Hotel Cancellations Using Support Vector Machines

Regression-based neural networks: Predicting Average Daily Rates for Hotels

Project Stages

Stage 1: Data Manipulation and Feature Selection

Stage 2: Classification

Stage 3: Regression

Stage 4: Time Series

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages