Unsupervised. Gaussains Mixture Model (GMM) also known as "Expectation Maximization Clustering". The idea of GMM is very simple: for a given dataset, each point is generated by linearly combining multiple multivariate Gaussians. It is a more advanced verison of the K means clustering method because it also take into account standard deviation.
Classify unlabeled data (assumes that data normally distributed)
K-Means
1. Hard Clustering of a point to one particular cluster.
2. Cluster is only defined by mean.
3. We can only have spherical clusters
4. It makes use of the L2 norm when optimizing
Expectation-Maximization
1. Soft Clustering(It gives a probability of any point belonging to a cluster).
2. Cluster is defined by mean and standard deviation
3. We can have elliptical clusters too
4. It does not depend on L2 norm, but is based on Expectation, the probability of the point belonging to a particular cluster. This makes K-means biased towards spherical clusters.
In case of bad results using Gaussian mixtures,
keep in mind the EM optimization only has local convergence properties,
just like gradient descent: it can get stuck. Restarting the
density estimation with other initial parameters might solve it
- bspiering
- Siraj Raval
- OpenGenius IQ
- Towards Data Science
- Gabor Lengyel