Factor Adjusted Spectral Clustering for Mixture Models

Soham Jana Co-Author
University of Notre Dame
 
Jianqing Fan Co-Author
Princeton University
 
Shange Tang First Author
Princeton University
 
Soham Jana Presenting Author
University of Notre Dame
 
Monday, Aug 4: 10:50 AM - 11:05 AM
2173 
Contributed Papers 
Music City Center 
This paper studies a factor modeling-based approach for clustering high-dimensional data. Statistical modeling with correlated structures pervades modern applications in economics, finance, genomics, wireless sensing, etc. Standard techniques for high-dimensional clustering, e.g., naive spectral method, often fail to yield good results in highly correlated setups. To address the problem in such scenarios we propose the Factor Adjusted Spectral Clustering (FASC) algorithm, which uses an additional data denoising step by eliminating the factor component to cope with data dependency. We prove that the FASC algorithm achieves an exponentially low mislabeling rate with respect to the signal to noise ratio under general assumptions. Our assumption bridges many classical factor models in the literature such as the pervasive factor model, the weak factor model, and the sparse factor model. FASC is also efficient, requiring only near-linear sample complexity with respect to the data dimension. We also show the applicability of FASC with real data experiments and numerical studies and establish that FASC provides significant results in many cases where traditional spectral clustering fails.

Keywords

Dependency modeling

dimensionality reduction

data denoising

mislabeling 

Main Sponsor

Section on Statistical Learning and Data Science