Interpretable, simultaneous relevant components from multimodal data with structured iSSANOVA-PCA

Rafael Irizarry Co-Author
Dana-Farber Cancer Institute
 
Senthil Kumar Muthiah First Author
 
Senthil Kumar Muthiah Presenting Author
 
Thursday, Aug 7: 11:35 AM - 11:50 AM
2342 
Contributed Papers 
Music City Center 
Isolating relevant variation in data and decomposing it into interpretable processes is critical for hypothesis driven research. In multimodal data analysis, classical simultaneous components analysis can prioritize irrelevant variance components (eg., in integrative genomics data analysis of thousands of features). Supervised PCA-type methods, developed for prediction tasks, limits us to situations with a measured response, assumes that relevance is captured by the response alone, and do not always lead to stable interpretations. We propose a semiparametric approach, where the simultaneous components are modeled as functions in a reproducing Kernel Hilbert Space, conducive for statistical modeling, yielding smoothing spline ANOVA decompositions. The result is a sequence of components (processes), ranked in their order of explaining total relevant data variation; the processes themselves are largely explained in terms of simple functions of known predictors, enhancing interpretability. The quality of relevant inferences obtained for a variety of research questions (e.g., cancer progression, host-pathogen response) demonstrate the significance of the proposal over the competition.

Keywords

pca

multi-modal

integrative

data analysis

interpretation

ssanova 

Main Sponsor

Biometrics Section