Learning Across Boundaries: Statistical and Machine Learning Methods for Biomedical Data Fusion
Yun Wei
Chair
University of Texas at Dallas
Lu Xia
Organizer
Michigan State University
Monday, Aug 3: 2:00 PM - 3:50 PM
1258
Invited Paper Session
Thomas M. Menino Convention & Exhibition Center
Room: CC-252B
Applied
No
Main Sponsor
ENAR
Co Sponsors
Biometrics Section
Section on Statistical Learning and Data Science
Presentations
Transfer learning is a powerful approach for improving model performance in a study of interest by leveraging data from related auxiliary studies. In this paper, we propose a novel transfer learning method to develop optimal linear predictors for continuous outcomes using datasets with differing sets of predictors. We address two challenges involved in this setting: distributional difference and covariate mismatch. The former refers to variations in data distributions across studies. The latter pertains to discrepancies in the measured covariates across studies, which result in mismatched feature spaces. Because direct data integration is not feasible, we extend the direct sparse regression procedure using covariance from multimodality data (DISCOM) framework with fusion learning to accommodate heterogeneous data sources. We demonstrate the robustness and efficacy of our proposed method through extensive simulation studies and an application to treatment utilization among ICU patients diagnosed with sepsis.
Speaker
Lu Tang, University of Pittsburgh
Hybrid controlled trials (HCTs) combine randomized controlled trials (RCTs) with external control data to enhance efficiency, but bias may arise when external controls differ systematically from trial participants. We propose conformal selective borrowing, a novel framework with automatic tuning that adaptively incorporates external data while preserving valid post-selection inference through randomization tests. This method unites modern conformal prediction techniques from machine learning with classical randomization principles pioneered by Fisher, improving statistical power while maintaining exact finite-sample type I error control. The framework offers a rigorous and flexible approach for generating credible evidence in settings where RCTs are small or patient accrual is slow. We illustrate its utility across continuous, binary, and time-to-event outcomes, present new theoretical results, and demonstrate its application in a non-small cell lung cancer case study.
Keywords
causal inference
conformal prediction
data integration
external control
randomization inference
Speaker
Ke Zhu, NCSU and Duke
Co-Author(s)
Jiajun Liu, Duke University School of Medicine
Shu Yang, North Carolina State University, Department of Statistics
Xiaofei Wang, Duke University Medical Center
You have unsaved changes.