46: Extending Sparse CCA for Multi-Population, Multi-Feature Integration

Quefeng Li Co-Author
University of North Carolina Chapel Hill
 
Yuchao Jiang Co-Author
Texas A&M University
 
Renee Ge First Author
 
Renee Ge Presenting Author
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
1922 
Contributed Posters 
Music City Center 
Sparse canonical correlation analysis (SCCA) identifies sparse linear combinations between two sets of features that are highly correlated with each other. While multiple SCCA methods extend this framework to more than two datasets, they assume measurements of different features within the same population. Here, we propose an extension of SCCA designed for settings with four data matrices derived from two distinct populations, each with two different feature sets. The correlation maximization problem is reframed as a minimization problem and the original canonical weights are decomposed into two separate components that capture the shared and unique variance for each dataset. Via simulations, we demonstrate the improved performance of our method to recover the true canonical weights in comparison to naïve methods that disregard either the shared or unique components. For real data analysis, we apply our method to integrate two single-cell multiomic datasets of peripheral blood mononuclear cells with simultaneous measures of both RNA expression and chromatin accessibility, benchmarking its performance against widely used single-cell integration pipelines such as Seurat and Signac.

Keywords

Sparse Canonical Correlation Analysis

Data Integration

Variance Decomposition

Single-Cell Multiomics 

Main Sponsor

Section on Statistics in Genomics and Genetics