Learning Across Boundaries: Statistical and Machine Learning Methods for Biomedical Data Fusion

Yun Wei Chair
University of Texas at Dallas
 
Guanqun Cao Discussant
Michigan State University
 
Lu Xia Organizer
Michigan State University
 
Monday, Aug 3: 2:00 PM - 3:50 PM
1258 
Invited Paper Session 
Thomas M. Menino Convention & Exhibition Center 
Room: CC-252B 

Applied

No

Main Sponsor

ENAR

Co Sponsors

Biometrics Section
Section on Statistical Learning and Data Science

Presentations

Transfer Learning for Linear Regression with Mismatched Covariates

Transfer learning is a powerful approach for improving model performance in a study of interest by leveraging data from related auxiliary studies. In this paper, we propose a novel transfer learning method to develop optimal linear predictors for continuous outcomes using datasets with differing sets of predictors. We address two challenges involved in this setting: distributional difference and covariate mismatch. The former refers to variations in data distributions across studies. The latter pertains to discrepancies in the measured covariates across studies, which result in mismatched feature spaces. Because direct data integration is not feasible, we extend the direct sparse regression procedure using covariance from multimodality data (DISCOM) framework with fusion learning to accommodate heterogeneous data sources. We demonstrate the robustness and efficacy of our proposed method through extensive simulation studies and an application to treatment utilization among ICU patients diagnosed with sepsis. 

Speaker

Lu Tang, University of Pittsburgh

Robust Estimation and Inference in Hybrid Controlled Trials

Hybrid controlled trials (HCTs) combine randomized controlled trials (RCTs) with external control data to enhance efficiency, but bias may arise when external controls differ systematically from trial participants. We propose conformal selective borrowing, a novel framework with automatic tuning that adaptively incorporates external data while preserving valid post-selection inference through randomization tests. This method unites modern conformal prediction techniques from machine learning with classical randomization principles pioneered by Fisher, improving statistical power while maintaining exact finite-sample type I error control. The framework offers a rigorous and flexible approach for generating credible evidence in settings where RCTs are small or patient accrual is slow. We illustrate its utility across continuous, binary, and time-to-event outcomes, present new theoretical results, and demonstrate its application in a non-small cell lung cancer case study. 

Keywords

causal inference

conformal prediction

data integration

external control

randomization inference 

Speaker

Ke Zhu, NCSU and Duke

Co-Author(s)

Jiajun Liu, Duke University School of Medicine
Shu Yang, North Carolina State University, Department of Statistics
Xiaofei Wang, Duke University Medical Center