Skip to content

tezzytezzy/credit-risk-anomaly-detection

Repository files navigation

Credit Card Anomaly Detection

Objective

Experiment with various binary classification models below and select the most appropriate based on Area Under the ROC Curve together with Principal Component Analysis (PCA) in Apache Spark.

  • Logistic Regression
  • RandomForest Classification
  • Linear Support Vector Classification
  • Gradient Boosted Tree Classification
  • Naive Bayes Classification

Installation

The following package to be installed:

pyspark                   2.4.5                      py_0 

Dataset

Statlog (German Credit Data) Data Set

Reference

Machine Learning with PySpark (ISBN 978-1-4842-4130-1)