Assignment07-Clustering

ExcelR Data Science Assignment No 7

Clustering :

It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, colour, behaviour, etc., and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabelled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can use this id to simplify the processing of large and complex datasets. The clustering technique is commonly used for statistical data analysis.

1. Hierarchical Clustering :

Hierarchical clustering is unsupervised machine learning algorithm, which is used to group the unlabelled datasets into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ depending on how they work.

2. K-Means algorithm :

The k-means algorithm is one of the most popular clustering algorithms. It classifies the dataset by dividing the samples into different clusters of equal variances. The number of clusters must be specified in this algorithm. It is fast with fewer computations required, with the linear complexity of O(n).

3. DBSCAN Algorithm :

It stands for Density-Based Spatial Clustering of Applications with Noise. It is an example of a density-based model similar to the mean-shift, but with some remarkable advantages. In this algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be found in any arbitrary shape.

This assignment will study following Questions :

Perform Clustering (Hierarchical, K Means Clustering and DBSCAN) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained.
Perform Clustering (Hierarchical, K Means & DBSCAN) for the crime data and identify the number of clusters formed and draw inferences.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Clustering 1.ipynb		Clustering 1.ipynb
Clustering 2.ipynb		Clustering 2.ipynb
EastWestAirlines.xlsx		EastWestAirlines.xlsx
README.md		README.md
crime_data.csv		crime_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment07-Clustering

Clustering :

1. Hierarchical Clustering :

2. K-Means algorithm :

3. DBSCAN Algorithm :

This assignment will study following Questions :

About

Releases

Packages

Languages

shanuhalli/Assignment-Clustering

Folders and files

Latest commit

History

Repository files navigation

Assignment07-Clustering

Clustering :

1. Hierarchical Clustering :

2. K-Means algorithm :

3. DBSCAN Algorithm :

This assignment will study following Questions :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages