Abstract:
The use of mixture models in statistical analysis is increasing for datasets with heterogeneity
and/or redundancy in the data. They are likelihood based models, and
maximum likelihood estimates of parameters are obtained by the use of the expectation
maximization (EM) algorithm. Multi-modality of the likelihood surface
means that the EM algorithm is highly dependent on starting points and poorly
chosen initial points for the optimization may lead to only a local maximum, not
the global maximum. In this thesis, different methods of choosing initialising
points in the EM algorithm will be evaluated and two procedures which make intelligent
choices of possible starting points and fast evaluations of their usefulness
will be presented. Furthermore, several approaches to measure the best model to
fit from a set of models for a given dataset, will be investigated and some lemmas
and theorems are presented to illustrate the information criterion.
This work introduces two novel and heuristic methods to choose the best starting
points for the EM algorithm that are named Combined method and Hybrid
PSO (Particle Swarm Optimisation). Combined method is based on a combination
of two clustering methods that leads to finding the best starting points in the
EM algorithm in comparison with the different initialisation point methods. Hybrid
PSO is a hybrid method of Particle Swarm Optimization (PSO) as a global
optimization approach and the EM algorithm as a local search to overcome the
EM algorithm’s problem that makes it independent to starting points. Finally it
will be compared with different methods of choosing starting points in the EM
algorithm.