Unsupervised Learning in a General Semiparametric Clusterwise Elliptical Distribution Model: Efficient Estimation, Optimal Classification, and Consistent Cluster Selection

Chin-Tsang Chiang Co-Author
National Taiwan University
 
Ming-Yueh Huang Co-Author
Academia Sinica
 
Jen-Chieh Teng Co-Author
 
Sheng-Hsin Fan First Author
National Taiwan University
 
Sheng-Hsin Fan Presenting Author
National Taiwan University
 
Thursday, Aug 7: 10:05 AM - 10:20 AM
0700 
Contributed Papers 
Music City Center 
This study introduces a general semiparametric clusterwise elliptical distribution model to examine the influence of latent clusters on observed continuous variables. The proposed method integrates a weighted sum of squares with a separation penalty to jointly partition individuals and estimate model parameters. A heuristic solution method is employed to generate initial values, enhancing the estimation process. The resulting consistent partition estimator forms the foundation for a pseudo maximum likelihood estimation procedure and a Bayesian classification rule, both of which iteratively update the partition and model parameter estimators. The partition estimator achieves optimal classification, while the model parameter estimators attain the semiparametric efficiency bound. A key contribution of this work is the development of semiparametric information criteria for determining the number of clusters, ensuring consistent cluster selection. Simulation studies and data analyses demonstrate the effectiveness of the proposed methodology.

Keywords

Clusterwise elliptical distribution

Density generator

Pseudo maximum likelihood

Semi-parametric efficiency

Semi-parametric information criterion

Separation penalty 

Main Sponsor

Section on Statistical Learning and Data Science