PPCA-XNORM: Harmonizing Multi-Platform Gene Expression Data via Probabilistic PCA

Disa Yu Co-Author
Sanofi
 
Jinfeng Zhang Co-Author
Florida State University
 
Xing Qiu Co-Author
 
Zhining Sui First Author
 
Zhining Sui Presenting Author
 
Monday, Aug 4: 9:50 AM - 10:05 AM
1534 
Contributed Papers 
Music City Center 
Cross-platform normalization is essential for integrating gene expression data from multiple platforms to improve statistical power and maximize the utility of publicly available datasets. However, existing methods struggle to disentangle biological variability from platform-specific effects, particularly when handling small or unbalanced sample sizes or data from more than two platforms. We propose PPCA-XNORM, a novel normalization framework based on Probabilistic Principal Component Analysis (PPCA), designed to address these limitations. Our model accounts for gene-specific platform effects through flexible location and scale adjustments while simultaneously capturing biological structure shared across genes via a low-rank between-gene correlation model. We develop a computationally efficient parameter estimation algorithm that combines conditional maximum likelihood estimation and gradient descent. Unlike previous methods, PPCA-XNORM supports normalization across three or more platforms, accommodates missing or unmatched samples during training, and enables cross-platform data transformation between arbitrary platforms via a closed-form conditional expectation without retraining. Using both simulated data and real-world RNA-seq and microarray datasets, we demonstrate that PPCA-XNORM consistently outperforms existing approaches, including MatchMixeR and Shambhala-2, in preserving biological signals while removing platform-specific artifacts.

Keywords

Cross-platform normalization

Probabilistic PCA

Platform-specific bias

Gene expression harmonization 

Main Sponsor

Section on Statistics in Genomics and Genetics