Starbucks Capstone Challenge

Machine Learning Nanodegree Capstone Project

by Sooyeon Won

Keywords

Supervised Learning
Binary Classification Models
1. sklearn ensemble models (Random Forest, Gradient Boosting, AdaBoost)
2. XG Boost vs. Light GBM vs. CatBoost
3. Logistic Regression Model (Benchmark)
Imbalanced Data
Synthetic Minority Oversampling Technique (SMOTE)
Evaluation Metrics
Accuracy, Precison, Recall, F1-score
Data Visualisations

Summary of Findings

In this analysis, I analysed how Starbucks customers use offers based on the transaction data. The customer profile dataset contains a few missing data. These missing values are imputed by the median value of each features. Imputation with its own median values has several advantages. Since the median value is one of the existing values, it is realistic. Also it makes the distribution less skewed. By adjusting the existing features in transcript dataframe, I convert the time column into day, and month columns.

Using the cleaned data, I explored current business situations. The number of traffics varies in each month. It increased until the third month. The change of sales amounts follows almost identical patterns with the change of traffics. We can understand that more traffics bring better sales performances. Interestingly, although the number of traffics and sales amount are different in each month, the average spending per each transaction is remarkably similar across the months.

All customers in the profile dataset purchased products at Starbucks, although not all of them received offers. The average age of customers are 54.5 years old, and their average yearly income is around 65227. There is no significant difference between genders regarding the received type of offers. Finally, the total number of offers and the number of each type of offers are not correlated with customer's ages, incomes, the number of days as a Starbucks member.

The number of issued offers is unbalanced. 'BOGO' and 'Discount' types of offers are almost evenly distributed; around 30000. On the other hand, "Informational" type of offer is issued only the half of them (ca. 15000). Not all of the issued offers are viewed. Only 75,68% (= 57725/76277) of offers are checked by customers. Only half of issued 'BOGO' and 'discount' offers are completed.

In addition, I explored customer purchasing patterns based on RFM analysis. RFM is an evaluation method to analyse customer value. It is often used in database marketing especially in retail and professional services industries. RFM indicates the following 3 dimensions: Recency, Frequency, Monetary Value.

As mentioned in Capstone proposal, I defined the desirably used offers by both Case 1 and Case 2. Based on the definition, I identified all offer usages into 2 groups: 'desirable', 'non-desirable' per each offer type. As you can see the first bar chart in the part 2, all three datasets highly imbalanced. Therefore, I alleviated the unbalanced datasets by applying Synthetic Minority Oversampling Technique (SMOTE). Then I trained each dataset with various classification models.

For all type of offers, "LGBMClassifier" showed the optimal model performances. It achieved 0.7742, 0.7388, 0.8830 of f1-score for bogo, discount, informational datasets, respectively, within the shorter period of time. The f1-score is considerably larger than that of the benchmark model. Also the time duration is much shorter than that of the benchmark model. The model with LGBMClassifier is more efficient than the benchmark model.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Capstone_Proposal_SooyeonWon.pdf		Capstone_Proposal_SooyeonWon.pdf
Capstone_Report_SooyeonWon.pdf		Capstone_Report_SooyeonWon.pdf
Desirable.PNG		Desirable.PNG
Part1_Starbucks_Capstone_Project_SooyeonWon.ipynb		Part1_Starbucks_Capstone_Project_SooyeonWon.ipynb
Part2_Starbucks_Capstone_Project_SooyeonWon.ipynb		Part2_Starbucks_Capstone_Project_SooyeonWon.ipynb
README.md		README.md
customer_df.csv		customer_df.csv
rfm_score.csv		rfm_score.csv
total_issued_offer.csv		total_issued_offer.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starbucks Capstone Challenge

Machine Learning Nanodegree Capstone Project

by Sooyeon Won

Keywords

Summary of Findings

References

About

Releases

Packages

Languages

SooyeonWon/ML_starbucks_capstone_projects

Folders and files

Latest commit

History

Repository files navigation

Starbucks Capstone Challenge

Machine Learning Nanodegree Capstone Project

by Sooyeon Won

Keywords

Summary of Findings

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages