Skip to content

A hybrid model consisting of ensemble classifier, k-prototype clustering and association rule mining models for customer churn analysis using majority voting technique for both feature selection and churn prediction on telecommunication dataset (IBM Watson Dataset).

Notifications You must be signed in to change notification settings

arghac14/Customer-Churn-Analysis

 
 

Repository files navigation

Customer-Churn-Analysis

A hybrid model consisting of ensemble classifier, k-prototype clustering and association rule mining models for customer churn analysis using majority voting technique for both feature selection and churn prediction on telecommunication dataset (IBM Watson Dataset).

Project Summary:

Finding the best features of the dataset by comparing the rules obtained from Decision Tree Classifier, Clustering Models(Kmodes and Kprototypes) and Assocication Rule Mining(Apriori algorithm) and then implementing Voting Classifiers(Ensemble Learning) with other classifier models, taking the best features with other classifier models to see if there is a boost in accuracy of prediction or not.

Workflow:

Collecting Data:

We used this telecom service customer churn dataset for this particular project- WA_Fn-UseC_-Telco-Customer-Churn.csv

Data preprocessing:

We cleaned the dataset and took dummy datas in the form of categorical datas for our classification purpose. Here is the new dataset- new_telco.csv

Initial classification using Decison Tree Classifier:

First of all, we did an initial classification by Implementing Decision Tree classifier using all the features of our dataset. We got an accuracy of 79.83% at depth=5 for Decision Tree Entropy technique. See the notebook

Implementation of Clustering models:

We implemented Kmodes and Kprototype clustering to get the clusters and centroids for each feature of our dataset. It will be used to find the best features of our dataset. See the notebook

Associtaion Rule Mining:

We implemented apriori algorithm(association rule miining) to get the rules of features, depending on which we will find the best features of our dataset. See the notebooks

Most important frqeuent items when Churn=0:

Most important frqeuent items when Churn=1:

Finding the best features:

Comparing the results of Decisiton Tree classifier, Clustering and Association Rule mining, we get the following best features- 'tenure','InternetService','PhoneService'.

Implementing other classifiers:

We implemented other classifiers like K-nearest Neighbors, Logistic Regression, Support Vector Machine, Random Forest and Naive Bayes Classifiers taking all the feauters and then taking the best features of the dataset. Then we compared the accuracy of the different models. See the notebooks

Accuracy report-

With all features:

With best features:

Implementing Voting Classifier (Ensemble Leaning):

We implemeted Voting Classifier that combines several classifier models in order to produce one optimal predictive model and improves the model performance. See the notebook

Final Result:

About

A hybrid model consisting of ensemble classifier, k-prototype clustering and association rule mining models for customer churn analysis using majority voting technique for both feature selection and churn prediction on telecommunication dataset (IBM Watson Dataset).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%