Factor Adjusted Spectral Clustering for Mixture Models
Soham Jana
Presenting Author
University of Notre Dame
Monday, Aug 4: 10:50 AM - 11:05 AM
2173
Contributed Papers
Music City Center
This paper studies a factor modeling-based approach for clustering high-dimensional data. Statistical modeling with correlated structures pervades modern applications in economics, finance, genomics, wireless sensing, etc. Standard techniques for high-dimensional clustering, e.g., naive spectral method, often fail to yield good results in highly correlated setups. To address the problem in such scenarios we propose the Factor Adjusted Spectral Clustering (FASC) algorithm, which uses an additional data denoising step by eliminating the factor component to cope with data dependency. We prove that the FASC algorithm achieves an exponentially low mislabeling rate with respect to the signal to noise ratio under general assumptions. Our assumption bridges many classical factor models in the literature such as the pervasive factor model, the weak factor model, and the sparse factor model. FASC is also efficient, requiring only near-linear sample complexity with respect to the data dimension. We also show the applicability of FASC with real data experiments and numerical studies and establish that FASC provides significant results in many cases where traditional spectral clustering fails.
Dependency modeling
dimensionality reduction
data denoising
mislabeling
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.