Sparse principal component analysis via double thresholding with applications in pseudo-bulk expression data
Jing Lei
Co-Author
Carnegie Mellon University
Qi Xu
Speaker
Carnegie Mellon University
Sunday, Aug 3: 5:05 PM - 5:25 PM
Topic-Contributed Paper Session
Music City Center
We study the problem of principal component estimation in high-dimensional settings, where the leading principal components exhibit both group and individual sparsity. This simultaneous sparsity structure is commonly observed in multi-cell-type gene expression data, where the same genes are often expressed across related cell subtypes in biological processes. To incorporate this structure into PCA, we propose a double-thresholding algorithm that first filters out group-level signals via group thresholding, then applies individual thresholding within each selected group to enforce individual sparsity. Our algorithm is computationally efficient and scalable, making it well-suited for high-dimensional gene expression analysis. Furthermore, we establish the consistency and convergence rate of the resulting estimator. Experiments on both simulated and real datasets demonstrate the effectiveness of our approach.
You have unsaved changes.