Topological Clustering with Covariate Selection

Jian Yin Co-Author
City University of Hong Kong
 
Parth Desai Co-Author
University of California, Berkeley
 
Rahul Ghosal Co-Author
Arnold School of Public Health, University of South Carolina
 
Yuan Wang Co-Author
Arnold School of Public Health, University of South Carolina
 
Jiaying Yi First Author
University of South Carolina
 
Jiaying Yi Presenting Author
University of South Carolina
 
Monday, Aug 4: 12:05 PM - 12:20 PM
2537 
Contributed Papers 
Music City Center 
Topological data analysis (TDA) is a powerful tool for detecting hidden structures in complex data like biological signals and networks. A key TDA algorithm, persistent homology (PH), captures multi-scale topological features in data robust to noise, as summarized by persistence diagrams (PDs). However, PDs' non-Euclidean nature complicates traditional analysis. Recent topological inference methods use heat kernel (HK) expansion of PDs in multi-group permutation tests. Extending the topological inference methods, we develop a topological clustering framework based on HK expansion of PDs. This flexible framework allows incorporation of Euclidean covariates into topological clustering. An automate data-driven selection procedure is also included for identifying the optimal number of topological clusters. Based on our HK-expansion-based topological clustering framework, we develop a data-driven method for selecting an optimal number of topological clusters and most significant covariates linked to them. We demonstrate our method's effectiveness in detecting clusters with varying degrees of topological dissimilarity through simulations and applications to brain signals and networks.

Keywords

Topological data analysis

Topological clustering

Heat kernel expansion 

Main Sponsor

Section on Statistical Learning and Data Science