Alignment of Untargeted Data through their Covariances: A Novel Perspective on a Classical Tool in Optimal Transport

George Stepaniants Speaker
 
Wednesday, Aug 6: 2:05 PM - 2:35 PM
Invited Paper Session 
Music City Center 
Feature alignment is a core challenge in statistics and machine learning, with critical applications in biostatistics, particularly in the alignment of untargeted metabolomics, proteomics, and lipidomics studies. These studies measure unlabeled compounds across patient cohorts, allowing for novel biomarker discovery but presenting complex feature matching problems when comparing, pooling, or annotating datasets. Traditional alignment methods from computer science often fail to capture the biological constraints required in such tasks. To address this, we explore the use of optimal transport—specifically, the Gromov–Wasserstein (GW) algorithm—for aligning features across biological datasets. We introduce GromovMatcher, a constrained GW solver, which demonstrates robust and accurate feature matching in real-world metabolomic studies of liver and pancreatic cancer, highlighting its utility in metabolomic data analysis.

Motivated by these results, we propose a new statistical framework for feature alignment between two unlabeled datasets whose features follow a Gaussian distribution with an unknown covariance structure. The key challenge is to recover the permutation in features of one dataset relative to the other. We develop both a quasi-maximum likelihood estimator (QMLE) and a GW-based approach to solve this "covariance alignment" problem, framing it as a quadratic assignment problem. We demonstrate experimentally that computation of the GW estimator scales favorably via Sinkhorn optimization. Our theoretical analysis shows that both QMLE and GW estimators achieve minimax-optimal statistical rates, offering the first statistical justification for using GW in feature alignment.

This work is part of my PhD research with Philippe Rigollet and Yanjun Han at MIT, in collaboration with Vivian Viallon's group at IARC in Lyon.