An Improved Semi-Supervised Gaussian Mixture Model (I-SGMM)

Main Article Content

Bakare K.A
Torentikaza I.E

Keywords

Abstract

In the era of data-driven decision-making, the Gaussian Mixture Model (GMM) stands as a cornerstone in statistical modeling, particularly in clustering and density estimation. The Improved GMM presents a robust solution to a fundamental problem in clustering: the determination of the optimal number of clusters. Unlike its predecessor, it does not rely on a predetermined cluster count but employs model selection criteria, such as the Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC), to automatically identify the most suitable cluster count for the given data. This inherent adaptability is a hallmark of the Improved GMM, making it a versatile tool in a broad spectrum of applications, from market segmentation to image processing. Furthermore, the Improved GMM revolutionizes parameter estimation and model fitting. It leverages advanced optimization techniques, such as the Expectation-Maximization (EM) algorithm or variational inference, to achieve convergence to more favorable local optima. This results in precise and reliable parameter estimates, including cluster means, covariances, and component weights. The Improved GMM is particularly invaluable when dealing with data of varying complexities, non-standard data distributions, and clusters with differing shapes and orientations. It excels at capturing the nuanced relationships within the data, providing a powerful framework for understanding complex systems. One of the key differentiators of the Improved GMM is its accommodation of full covariance matrices for each component. This feature empowers the model to account for intricate interdependencies between variables, which is essential for modeling real-world data effectively. It is capable of handling data that exhibits non-spherical or irregular cluster shapes, a significant limitation of the traditional GMM.