Soft Post-Clustering Inference with EHR Applications

Anru Zhang Co-Author
Duke University
 
Zihan Zhu Co-Author
The Wharton School of the University of Pennsylvania
 
Qiuyi Wu Speaker
Duke University
 
Wednesday, Aug 6: 10:35 AM - 10:55 AM
Topic-Contributed Paper Session 
Music City Center 
Clustering methods play a crucial role in Electronic Health Records (EHR) research, where patient subpopulations exhibit complex structures. While traditional clustering assumes hard assignments, soft clustering techniques such as Fuzzy C-Means (FCM) allow for probabilistic memberships, capturing inherent uncertainty. However, statistical inference on soft clustering remains an underexplored area.
In this work, we introduce a novel post-clustering inference framework for FCM, enabling hypothesis testing and uncertainty quantification in soft clustering assignments. Specifically, we extend the traditional FCM by incorporating weighted clustering, where clusters with high similarity are identified and adjusted accordingly. For instance, when multiple clusters share similar centroids, they can be reweighted to reflect their collective contribution, ensuring that redundant splits do not distort the clustering structure. This weighted formulation acknowledges that some clusters contribute disproportionately, improving interpretability and robustness.
We present theoretical properties, simulation studies, and an application to real-world EHR data, demonstrating how our weighted FCM framework enhances clustering inference. Our approach provides a principled way to conduct hypothesis testing in soft clustering, offering new insights for data-driven decision-making in biomedical and health informatics applications.