Merging Versus Ensembling: An Adaptive Blending Approach for Handling Domain Heterogeneity

Prasad Patil Co-Author
Boston University
 
Kevin Lane Co-Author
Boston University - Department of Environmental Health
 
Daniel Kojis First Author
 
Daniel Kojis Presenting Author
 
Monday, Aug 4: 11:45 AM - 11:50 AM
1598 
Contributed Speed 
Music City Center 
In multi-domain settings, where observations come from distinct but related data sources, heterogeneity often exists across domains due to shifts in data distributions. In cases of high heterogeneity, (1) training individual models on each domain and ensembling their predictions (ensemble approach) has been shown to outperform (2) combining domain datasets and fitting a single model (merged approach). However, determining when to choose each approach is less clear. This paper presents Multi-Study Adaptive Blend (MSAB), a method for optimally combining predictions from the ensemble and merged approaches adaptively across varying levels of heterogeneity. First, we provide theoretical insights on optimizing the combination weight in a linear model setting. Second, we propose a domain-wise cross-validation strategy for estimating the optimal blending weight as a practical, data-driven approach for broader applications. For a given heterogeneity level, MSAB performs comparable to or better than the best individual strategy (merged or ensemble), offering robust performance across low and high heterogeneity settings. MSAB offers potential improvements in predictive performance and mitigates the risk of selecting a suboptimal approach in multi-domain settings.

Keywords

machine learning

domain generalization

ensemble learning

multi-study prediction 

Main Sponsor

Section on Statistical Learning and Data Science