Bank Churn Prediction is a project aimed at assisting banks in predicting customer churn, thereby enabling them to implement proactive retention strategies. This repository contains code and resources related to data preprocessing, model building, evaluation, and prediction.
In the banking industry, customer retention is crucial for sustainable growth. It's more cost-effective to retain existing customers than to acquire new ones. However, predicting when customers might leave (churn) is challenging. This project leverages machine learning techniques to analyze historical banking data, identify patterns, and predict potential churn.
The repository structure is organized as follows:
- churn.csv: Dataset containing historical bank customer data.
- holdout_churn.csv: Dataset for holdout data used for prediction.
- holdout_churn_result.csv: Actual churn results for holdout data.
- bank_churn_prediction.ipynb: Jupyter Notebook containing the Python code for data preprocessing, model building, evaluation, and prediction.
🔍 Data Exploration: We dive deep into the dataset, cleaning it up and identifying important features.
🛠️ Model Building: Using machine learning models like Logistic Regression, SVM, and Random Forest, we predict which customers might leave.
📊 Model Evaluation: We assess the performance of each model using accuracy, precision, recall, and F1 score metrics.
🔎 Hyperparameter Tuning: Fine-tuning the models to improve their performance using techniques like Randomized Search.
📈 Feature Importance Analysis: Understanding the significance of different features in predicting customer churn.
📋 Holdout Data Prediction: Making predictions on unseen holdout data to evaluate the model's real-world effectiveness.
- Python
- pandas
- scikit-learn
- matplotlib
- seaborn