Monday, Aug 3: 2:00 PM - 3:50 PM
1258
Invited Paper Session
Applied
No
Main Sponsor
ENAR
Co Sponsors
Biometrics Section
Section on Statistical Learning and Data Science
Presentations
Transfer learning is a powerful approach for improving model performance in a study of interest by leveraging data from related auxiliary studies. In this paper, we propose a novel transfer learning method to develop optimal linear predictors for continuous outcomes using datasets with differing sets of predictors. We address two challenges involved in this setting: distributional difference and covariate mismatch. The former refers to variations in data distributions across studies. The latter pertains to discrepancies in the measured covariates across studies, which result in mismatched feature spaces. Because direct data integration is not feasible, we extend the direct sparse regression procedure using covariance from multimodality data (DISCOM) framework with fusion learning to accommodate heterogeneous data sources. We demonstrate the robustness and efficacy of our proposed method through extensive simulation studies and an application to treatment utilization among ICU patients diagnosed with sepsis.
Speaker
Lu Tang, University of Pittsburgh
Hybrid controlled trials (HCTs) combine randomized controlled trials (RCTs) with external control data to enhance efficiency, but bias may arise when external controls differ systematically from trial participants. We propose conformal selective borrowing, a novel framework with automatic tuning that adaptively incorporates external data while preserving valid post-selection inference through randomization tests. This method unites modern conformal prediction techniques from machine learning with classical randomization principles pioneered by Fisher, improving statistical power while maintaining exact finite-sample type I error control. The framework offers a rigorous and flexible approach for generating credible evidence in settings where RCTs are small or patient accrual is slow. We illustrate its utility across continuous, binary, and time-to-event outcomes, present new theoretical results, and demonstrate its application in a non-small cell lung cancer case study.
Keywords
causal inference
conformal prediction
data integration
external control
randomization inference
Speaker
Ke Zhu, NCSU and Duke
Co-Author(s)
Jiajun Liu, Duke University School of Medicine
Shu Yang, North Carolina State University, Department of Statistics
Xiaofei Wang, Duke University Medical Center
Unsupervised domain adaptation aims to transfer predictive knowledge from a labeled source dataset to an unlabeled target dataset whose feature distributions differ. While recent deep learning approaches have shown success in aligning latent representations across domains, a fundamental challenge remains: determining when the source information is truly transferable to the target problem. In this work, we propose a deep active unsupervised domain adaptation framework that integrates active learning principles into the domain adaptation process. Our method strategically selects a small subset of target samples for labeling based on model uncertainty and representativeness in the learned latent space, thereby maximizing the informational value of limited labeling effort. These selectively labeled data will enable formal assessment of transferability between the source and target domains. This study highlights the importance of adaptive sample selection in bridging domain gaps and guiding data-efficient model adaptation in high-dimensional settings.
Speaker
Lu Xia, Michigan State University