Skip to content

Codes for simulation studies to examine the performance of the EM algorithm and its modifications Classification EM and Stochastic EM for Gaussian mixture and a mixture of Markov chains.

Notifications You must be signed in to change notification settings

jantas/Mixture_models_simulations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mixture_models_simulation

Codes used in my Master's final project.

Abstract

Finite mixture models provide a convenient framework for model-based clustering. Traditionally, the model parameters are estimated by maximum likelihood estimation, fulfilled by the expectation-maximization (EM) algorithm. Such approach to clustering has many advantages but also several pitfalls. Some of those issues can be overcome by varying the EM algorithm. We describe two variants of the EM algorithm, namely the Classification EM (CEM) and the Stochastic EM (SEM).

We study the performance of the standard EM, CEM, and SEM measured by the Adjusted Rand index in simulation studies for two different mixtures. First, we examine a finite Gaussian mixture model which is, by far, the most popular and widely studied mixture model. Based on our study, the three procedures suffer more from higher overlap among clusters rather than from increasing the number of dimensions $p$, except for CEM which appeared to be more impacted by higher $p$. The results obtained by the standard EM were the most accurate while CEM and SEM were faster especially for the first couple of steps. When it comes to accuracy, SEM outperformed CEM in every simulation scenario. Then, we present our results for a finite mixture of Markov chains. We conducted a simulation study similar to the one with Gaussian mixture but additionally, we studied how frequently the three procedures identify the correct number of components $K$. Based on the results, the accuracy was very similar for all the three procedures, yet the standard EM appeared to be identifying the correct $K$ earlier (for lower sample sizes and shorter sequences) and resulted in slightly more accurate clusterings. As for Gaussian mixtures, running CEM or SEM for the first couple of steps and then running the standard EM seems advisable to achieve faster progress while maintaining high accuracy.

Keywords: Finite mixture models, clustering, EM algorithm, classification EM, stochastic EM, adjusted Rand Index.

About

Codes for simulation studies to examine the performance of the EM algorithm and its modifications Classification EM and Stochastic EM for Gaussian mixture and a mixture of Markov chains.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages