Skip to content

Customer segmentation using RFM Analysis and K-means clustering in Apache Spark

Notifications You must be signed in to change notification settings

d-vignesh/CustomerSegmentationWithSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Model to Segment customers based on their previous purchase history. We use the RFM Analysis and K-means clustering techniques to cluster similar customers and use Apache Spark to code our model.The dataset used here is the Ecommerce data provided in kaggle: https://www.kaggle.com/carrie1/ecommerce-data/ . One issue i faced with the dataset is that the date format provided for the InvoiceDate column could not be processed in spark, i was not sure it was due my environment issue so i used a python script(provided in this repo) to reformat the InvoiceDate.

About

Customer segmentation using RFM Analysis and K-means clustering in Apache Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published