PPCA-XNORM: Harmonizing Multi-Platform Gene Expression Data via Probabilistic PCA
Monday, Aug 4: 9:50 AM - 10:05 AM
1534
Contributed Papers
Music City Center
Cross-platform normalization is essential for integrating gene expression data from multiple platforms to improve statistical power and maximize the utility of publicly available datasets. However, existing methods struggle to disentangle biological variability from platform-specific effects, particularly when handling small or unbalanced sample sizes or data from more than two platforms. We propose PPCA-XNORM, a novel normalization framework based on Probabilistic Principal Component Analysis (PPCA), designed to address these limitations. Our model accounts for gene-specific platform effects through flexible location and scale adjustments while simultaneously capturing biological structure shared across genes via a low-rank between-gene correlation model. We develop a computationally efficient parameter estimation algorithm that combines conditional maximum likelihood estimation and gradient descent. Unlike previous methods, PPCA-XNORM supports normalization across three or more platforms, accommodates missing or unmatched samples during training, and enables cross-platform data transformation between arbitrary platforms via a closed-form conditional expectation without retraining. Using both simulated data and real-world RNA-seq and microarray datasets, we demonstrate that PPCA-XNORM consistently outperforms existing approaches, including MatchMixeR and Shambhala-2, in preserving biological signals while removing platform-specific artifacts.
Cross-platform normalization
Probabilistic PCA
Platform-specific bias
Gene expression harmonization
Main Sponsor
Section on Statistics in Genomics and Genetics
You have unsaved changes.