Sparse Convex Biclustering

Chenliang Gu Co-Author
Center for Statistics and Data Science, Beijing Normal University
 
Binhuan Wang Co-Author
AbbVie
 
Jiakun Jiang First Author
Beijing Normal University at Zhuhai
 
Binhuan Wang Presenting Author
AbbVie
 
Thursday, Aug 7: 12:05 PM - 12:20 PM
2259 
Contributed Papers 
Music City Center 

Description

Biclustering is an unsupervised machine-learning technique that simultaneously clusters rows and columns in a data matrix. It has been gaining increasing attention over the past two decades driven by the increasing complexity and volume of data in fields like genomics, transcriptomics, and other high-throughput omics technologies. However, discovering significant bi-clusters in large-scale datasets is an NP-hard problem. The accuracy and stabilities of most existing biclustering algorithms decrease significantly as dataset size increases. That is mainly due to accumulation of noise in high dimension features and their non-convex optimization formulations. To address this, we propose a new method called sparse convex biclustering (SCB), which penalizes the noise to zero in the process of biclustering. A tuning criterion based on clustering stability is developed to optimally balance cluster fitting and sparsity. We conduct comprehensive numerical studies using simulated data to demonstrate the superior performance of SCB in comparison to several state-of-the-art alternatives. Furthermore, we apply our method to the analysis of mouse olfactory bulb (MOB) data.

Keywords

Convex biclustering

Sparsity

ADMM

High-dimensional data 

Main Sponsor

Section on Statistical Learning and Data Science