Print Close

Merging Versus Ensembling: An Adaptive Blending Approach for Handling Domain Heterogeneity

Presented During: SPEED 3: Statistical Methods for High Dimensional and Complex Data , Part 1

Prasad Patil Co-Author
Boston University

Kevin Lane Co-Author
Boston University - Department of Environmental Health

Daniel Kojis First Author

Daniel Kojis Presenting Author

Monday, Aug 4: 11:45 AM - 11:50 AM
1598
Contributed Speed

Music City Center

In multi-domain settings, where observations come from distinct but related data sources, heterogeneity often exists across domains due to shifts in data distributions. In cases of high heterogeneity, (1) training individual models on each domain and ensembling their predictions (ensemble approach) has been shown to outperform (2) combining domain datasets and fitting a single model (merged approach). However, determining when to choose each approach is less clear. This paper presents Multi-Study Adaptive Blend (MSAB), a method for optimally combining predictions from the ensemble and merged approaches adaptively across varying levels of heterogeneity. First, we provide theoretical insights on optimizing the combination weight in a linear model setting. Second, we propose a domain-wise cross-validation strategy for estimating the optimal blending weight as a practical, data-driven approach for broader applications. For a given heterogeneity level, MSAB performs comparable to or better than the best individual strategy (merged or ensemble), offering robust performance across low and high heterogeneity settings. MSAB offers potential improvements in predictive performance and mitigates the risk of selecting a suboptimal approach in multi-domain settings.

Keywords

machine learning

domain generalization

ensemble learning

multi-study prediction

Main Sponsor

Section on Statistical Learning and Data Science