Prediction Classify Hotel Booking

Table of Content

Business Problem
Goal
Abstract
The Data
Analysis
Modeling
- Experiment 1
- Experiment 2
Final Model
Conclusion

Business Problem

Hotel cancellation rates have been steadily rising for years, in part fueled by a culture of ‘book now, pay later’ created by OTA giants. As a hotelier, it’s important for you to be aware of the patterns because it will help you to determine how to combat the issue. Back in 2019, D-Edge Hospitality Solutions reported that the global cancellation rate of hotel reservations reached 40%(with the direct website channel keeping the lowest cancellation rate). It’s imperative as a hotelier that you do not ignore this growing behaviour, it’s crucial you use all the tools at your disposal to minimise the upset to your revenue.

Goal

This project applies basic machine learning concepts on data collected to make prediction models to classify a hotel booking׳s likelihood to be canceled. This project aims to develop a model to classify whether hotel booking is to be canceled or not.

Abstract

The purpose of this report will be to use the Trivago Booking Data to classify hotel bookings into possible canceled bookings or retained bookings. This can be used to gain insight into how and why bookings are canceled. This can also be used as a model to gain a marketing advantage, by advertisement targeting those who are more likely to retain their bookings or saving money by not targeting the bookings that are most likely to cancel their bookings.

The Data

The data was originally created and found by Nuno Antonio, Ana Almeida, and Luis Nunes for the following paper

You can access and download the data set from Kaggle

In this project, I' am not using all of the data. I just used 50.000 samples data.

Analysis

37,1 % of customers in this data cancelled their reservations.

Summarize the class distribution

The average of guests staying in the hotel is 3-4 days. The guest average stayed for one night on the weekend and stayed two nights on the weekdays.
The cost average for every room is $102.00/night. Based on the graph below, we can see that the room price is quite volatile. And the room price always significantly increases every year.

The Average Room Price(EveryYear)	The Average Room Price(EveryDay)

In general, we found that customers are more likely to cancel a booking when they have the following features:
- Reserved a booking with a full-board meal.
- Reserved a booking from the group's market segment.
- Have no special requests.

Market Segment Percentag	Meal Type Percentagee	Total of Special Requests Percentage

However, customers are more likely to retain a booking when they:
- There are repeat guests.
- Come from the following market segment:
  - Direct
  - Corporate
  - Complementary
  - Aviation
- Are classified as a group customer type.

Market Segment Percentage	Repeate Order Percentage	Total of Special Requests

These statistics can be useful for marketing towards customers, to increase booking retention, or they can be used to help a company focus on areas of improvement.

Modeling

Experiment 1: Scaling/Normalize the data using MinMaxScaler and do Hyperparameter tunning for the best

Baseline

Algorithm	Accuracy
LogisticRegression	0.73
LinearDiscriminantAnalysis	0.73
KNeighborsClassifier	0.92
GaussianNB	0.69
SVC	0.92

With Hyperparameter Tunning

Algorithm	Accuracy
KNeighborsClassifier	0.94
SVC	0.95

Experiment 2: Use the ensembles method and use Hyperparameter tunning for the best one

Baseline

Algorithm	Accuracy
AdaBoostClassifier	0.79
GradientBoostingClassifier	0.90
RandomForestClassifier	0.93

With Hyperparameter Tunning

Algorithm	Accuracy
RandomForestClassifier	0.95

Final Model

Actually, in my opinion, there is no correct answer to what model to choose. Usually, we will choose the model that gives us high accuracy. Based on our experiment above, Support Vector Machine and Random Forest give us the best accuracy (0.95).

Which one between them should we take? The model that we choose depends on the goal of our project. In this project, I do more consider about to press the false-negative/FN (machine predict the customer will come, but in reality, the customer is canceled). Why did I choose to press the false-negative? The aim of this model is to make more profit for the hotel business, or we just see from a business profit perspective. If the hotel can predict does the customers will be canceled or not:

If the customer is predicted to be canceled, the hotel should not prepare that much about the reservation(e.g.breakfast). So it will be saved the hotel money expense.

Conclusion

With this information hotels can, for example, contact clients that the model predicted will cancel in order to get a cancellation earlier - so they can have more time to resell the room. Or perhaps approach the client in a way to make them feel special and keep their reservation and therefore cancel the others he or she had made in other hotels in the same city.

Software and Libraries

This project uses the following software and Python libraries:

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Classification_Hotel_Cancel_or_Not.ipynb		Classification_Hotel_Cancel_or_Not.ipynb
README.md		README.md
averagestayed.PNG		averagestayed.PNG
customertype.png		customertype.png
distribution_channel.png		distribution_channel.png
iscancaled.png		iscancaled.png
marketsegment.png		marketsegment.png
mealtype.png		mealtype.png
priceyear.png		priceyear.png
repeateguests.png		repeateguests.png
specialreq.png		specialreq.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction Classify Hotel Booking

Table of Content

Business Problem

Goal

Abstract

The Data

Analysis

Modeling

Experiment 1: Scaling/Normalize the data using MinMaxScaler and do Hyperparameter tunning for the best

Experiment 2: Use the ensembles method and use Hyperparameter tunning for the best one

Final Model

Conclusion

Software and Libraries

About

Releases

Packages

Languages

docum5/Prediction_Classify_Hotel_Booking

Folders and files

Latest commit

History

Repository files navigation

Prediction Classify Hotel Booking

Table of Content

Business Problem

Goal

Abstract

The Data

Analysis

Modeling

Experiment 1: Scaling/Normalize the data using MinMaxScaler and do Hyperparameter tunning for the best

Experiment 2: Use the ensembles method and use Hyperparameter tunning for the best one

Final Model

Conclusion

Software and Libraries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages