Print Close

46: Extending Sparse CCA for Multi-Population, Multi-Feature Integration

Presented During: Contributed Poster Presentations: Section on Statistics in Genomics and Genetics

Quefeng Li Co-Author
University of North Carolina Chapel Hill

Yuchao Jiang Co-Author
Texas A&M University

Renee Ge First Author

Renee Ge Presenting Author

Tuesday, Aug 5: 2:00 PM - 3:50 PM
1922
Contributed Posters

Music City Center

Sparse canonical correlation analysis (SCCA) identifies sparse linear combinations between two sets of features that are highly correlated with each other. While multiple SCCA methods extend this framework to more than two datasets, they assume measurements of different features within the same population. Here, we propose an extension of SCCA designed for settings with four data matrices derived from two distinct populations, each with two different feature sets. The correlation maximization problem is reframed as a minimization problem and the original canonical weights are decomposed into two separate components that capture the shared and unique variance for each dataset. Via simulations, we demonstrate the improved performance of our method to recover the true canonical weights in comparison to naïve methods that disregard either the shared or unique components. For real data analysis, we apply our method to integrate two single-cell multiomic datasets of peripheral blood mononuclear cells with simultaneous measures of both RNA expression and chromatin accessibility, benchmarking its performance against widely used single-cell integration pipelines such as Seurat and Signac.

Keywords

Sparse Canonical Correlation Analysis

Data Integration

Variance Decomposition

Single-Cell Multiomics

Main Sponsor

Section on Statistics in Genomics and Genetics