On finite mixture modeling and model-based clustering of multivariate categorical sequences

Volodymyr Melnykov Co-Author
University of Alabama
 
Yingying Zhang First Author
Western Michigan University
 
Yingying Zhang Presenting Author
Western Michigan University
 
Sunday, Aug 3: 3:35 PM - 3:50 PM
0902 
Contributed Papers 
Music City Center 
Clustering algorithms for quantitative data have been explored in literature extensively. However, many real-life applications involve qualitative data. The range of clustering procedures available in this framework is very limited. Categorical sequences have attracted the attention of researchers recently. Several existing methods used for the analysis of such data have been developed for univariate sequences. Oftentimes, however, observations in the form of multivariate categorical sequences are utilized. Currently, there is a lack of models developed for this framework. The analysis of several univariate sequences ignores possible effects of the sequences on each other and poses challenges related to the agglomeration of obtained results. In this paper, we propose a novel mixture model for multivariate categorical sequences that can effectively model heterogeneity in data and reflect the dynamic nature of the data. As we demonstrate in the series of simulation studies, the developed mixture model shows good model-based clustering performance. The application of the method to the British Household Panel Survey data set produces meaningful results.

Keywords

EM algorithm

Finite mixture model

Markov model

model-based clustering

multivariate categorical sequences 

Main Sponsor

Section on Statistical Computing