Skip to content

Identifying specific groups in customer base with K-Means clustering

License

Notifications You must be signed in to change notification settings

secil-carver/KMeans-Cluster-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

KMeans-Cluster-Analysis

Identifying specific groups in the customer base with K-Means clustering allows us to connect patterns between data elements without labels attached to them. Once these structures are discovered, we can make more informed decisions about our customers.

Research Question

Can we identify specific groups with patterns in our customer base using the K-Means Clustering method?

The Goal of the Data Analysis

By Identifying specific groups in the customer base, we will be able to identify and define patterns among customers. This information will allow us to formulate deeper analyses and predictions for customer churn.

Technique Justification

Clustering allows the identification of patterns between data elements without any labels attached to them. Revealing these patterns helps discover structures in the data that were not obvious before. Once these structures are discovered, making informed decisions become easier. The clustering technique I used was K-Means clustering. K-Means is an unsupervised clustering algorithm that can be used to quickly cluster large datasets. The algorithm initially assigns any k centroids. Then, iteratively, assigns the points to clusters based on the distance between the points and the centroids until all data points are assigned to a cluster. These clusters can then be put back into the dataset to reveal insights.

Elbow Method

The Elbow Method helps to choose the optimum number of clusters by fitting the model with a range of values for k. The chart indicates the best fit at the "elbow" point as shown in the example below.

image

Silhouette Method

Visual interpretation of clusters of data. It provides graphical validation of how well each object is classified. The score assigned determines how well each object lies in its cluster. A high average silhouette indicates good clustering. Numbers close to zero indicate the object lies in the decision boundary between two clusters.

image

About

Identifying specific groups in customer base with K-Means clustering

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published