Skip to content

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes. Jill asks us to use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling

Notifications You must be signed in to change notification settings

utsavchaudharygithub/Credit_Risk_Analysis

Repository files navigation

Credit_Risk_Analysis

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes. Jill asks us to use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

The purpose:

-Use Resampling Models to Predict Credit Risk

  • Use the SMOTEENN algorithm to Predict Credit Risk
  • Use Ensemble Classifiers to Predict Credit Risk
  • A Written Report on the Credit Risk Analysis (README.md)

Resources used:

LoanStats_2019Q1.csv credit_risk_resampling_starter_code.ipynb and credit_risk_ensemble_starter_code.ipynb.

Applications used:

Jupyter Notebook

Algorithms used:

-Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm


Use Resampling Models to Predict Credit Risk:

Evaluated three machine learning models by using resampling to determine which is better at predicting credit risk. We used the oversampling RandomOverSampler and SMOTE algorithms, and then used the undersampling ClusterCentroids algorithm. Using these algorithms, resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report. * Screen Shot 2022-03-07 at 11 56 08 PM

-Balanced accuracy score: 65% -high risk: 0.01% -low risk: 1% -recall high risk: 63% -recall low risk: 67%


Use the SMOTE Algorithm to Predict Credit Risk:

We used a combinatorial approach of over- and undersampling with the SMOTEE algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEE algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.

Screen Shot 2022-03-08 at 12 03 18 AM

-Balanced accuracy score: 79% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 64% -recall low risk: 66%

Random Forest Classifier:

Screen Shot 2022-03-08 at 12 12 46 AM

-Balanced accuracy score: 79% -Precision high risk: 0.4% -Precision low risk: 1% -recall high risk: 67% -recall low risk: 91%


Use the SMOTEEN algorithm:

Use the SMOTEEN Algorithm to Predict Credit Risk: We used a combinatorial approach of over- and undersampling with the SMOTEEN algorithm to determine if the results from the combinatorial approach are better at predicting credit risk than the resampling algorithms. Using the SMOTEENN algorithm, we resampled the dataset, viewed the count of the target classes, trained a logistic regression classifier, calculated the balanced accuracy score, generated a confusion matrix, and generate a classification report.

Screen Shot 2022-03-08 at 12 30 00 AM

-Balanced accuracy score: 61% -Precision high risk: 0.1% -Precision low risk: 1% -recall high risk: 69% -recall low risk: 55%


Use Ensemble Classifiers to Predict Credit Risk:

using imblearn.ensemble library, trained and compared two different ensemble classifiers, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk and evaluated each model. Using both algorithms,resampled the dataset, viewed the count of the target classes, trained the ensemble classifier, calculated the balanced accuracy score, generated a confusion matrix, and generated a classification report.

Screen Shot 2022-03-08 at 12 28 01 AM

-Balanced accuracy score: 92% -Precision high risk: 7% -Precision low risk: 1% -recall high risk: 91% -recall low risk: 94%


Summary on the Credit Risk Analysis

Algorithms used were: -Resampling Models -Ensemble Classifiers -SMOTEEN algorithm -Random Forest Classifier SMOTE Algorithm

Among all the given algorithm Ensemble classifers using imblearn.ensemble library, which has the accuracy rate of 92%. but also has the high recall of 91%. all the algorithms have higher recall risk. When working with balanced accuracy, the highest compared accuracy between 0 and 1 and is closest to 1 is the best machine learning model. Hence, this algorithm is recommened.

About

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we needed to employ different techniques to train and evaluate models with unbalanced classes. Jill asks us to use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published