Learning Across Boundaries: Statistical and Machine Learning Methods for Biomedical Data Fusion

Haolei Weng Chair
Michigan State University
 
Guanqun Cao Discussant
Michigan State University
 
Lu Xia Organizer
Michigan State University
 
Monday, Aug 3: 2:00 PM - 3:50 PM
1258 
Invited Paper Session 

Applied

No

Main Sponsor

ENAR

Co Sponsors

Biometrics Section
Section on Statistical Learning and Data Science

Presentations

Transfer Learning for Linear Regression with Mismatched Covariates

Transfer learning is a powerful approach for improving model performance in a study of interest by leveraging data from related auxiliary studies. In this paper, we propose a novel transfer learning method to develop optimal linear predictors for continuous outcomes using datasets with differing sets of predictors. We address two challenges involved in this setting: distributional difference and covariate mismatch. The former refers to variations in data distributions across studies. The latter pertains to discrepancies in the measured covariates across studies, which result in mismatched feature spaces. Because direct data integration is not feasible, we extend the direct sparse regression procedure using covariance from multimodality data (DISCOM) framework with fusion learning to accommodate heterogeneous data sources. We demonstrate the robustness and efficacy of our proposed method through extensive simulation studies and an application to treatment utilization among ICU patients diagnosed with sepsis. 

Speaker

Lu Tang, University of Pittsburgh

Robust Estimation and Inference in Hybrid Controlled Trials

Hybrid controlled trials (HCTs) combine randomized controlled trials (RCTs) with external control data to enhance efficiency, but bias may arise when external controls differ systematically from trial participants. We propose conformal selective borrowing, a novel framework with automatic tuning that adaptively incorporates external data while preserving valid post-selection inference through randomization tests. This method unites modern conformal prediction techniques from machine learning with classical randomization principles pioneered by Fisher, improving statistical power while maintaining exact finite-sample type I error control. The framework offers a rigorous and flexible approach for generating credible evidence in settings where RCTs are small or patient accrual is slow. We illustrate its utility across continuous, binary, and time-to-event outcomes, present new theoretical results, and demonstrate its application in a non-small cell lung cancer case study. 

Keywords

causal inference

conformal prediction

data integration

external control

randomization inference 

Speaker

Ke Zhu, NCSU and Duke

Co-Author(s)

Jiajun Liu, Duke University School of Medicine
Shu Yang, North Carolina State University, Department of Statistics
Xiaofei Wang, Duke University Medical Center

Active Unsupervised Domain Adaptation with Deep Learning

Unsupervised domain adaptation aims to transfer predictive knowledge from a labeled source dataset to an unlabeled target dataset whose feature distributions differ. While recent deep learning approaches have shown success in aligning latent representations across domains, a fundamental challenge remains: determining when the source information is truly transferable to the target problem. In this work, we propose a deep active unsupervised domain adaptation framework that integrates active learning principles into the domain adaptation process. Our method strategically selects a small subset of target samples for labeling based on model uncertainty and representativeness in the learned latent space, thereby maximizing the informational value of limited labeling effort. These selectively labeled data will enable formal assessment of transferability between the source and target domains. This study highlights the importance of adaptive sample selection in bridging domain gaps and guiding data-efficient model adaptation in high-dimensional settings. 

Speaker

Lu Xia, Michigan State University