Calibrated Digital Twins: Improving RCT Analysis with Distributionally Shifted RWD

Huiyuan Wang Speaker
University of Pennsylvania
 
Thursday, Aug 7: 8:35 AM - 8:55 AM
Topic-Contributed Paper Session 
Music City Center 
Randomized controlled trials (RCTs) provide internally valid estimates of treatment effects but are often costly and underpowered. In contrast, real-world data (RWD) offer large-scale, passively collected information that can improve efficiency when integrated appropriately. Recent methods have explored using predictive models trained on RWD—so-called digital twins—to generate individualized outcome predictions and augment RCT analysis. However, when the RWD and RCT populations differ, naively applying external models can induce model shift bias and efficiency loss, undermining the validity of causal conclusions.
We propose a new framework that combines RWD-based digital twin modeling with a calibration step using auxiliary outcomes. This calibration adjusts for systematic discrepancies between the trial and real-world populations, enabling valid and efficient treatment effect estimation in hybrid trial designs. Our approach generalizes classical covariate-adjusted regression and prediction-powered inference to settings with distributional shift between data sources. Theoretically, we show that the proposed estimator is consistent and achieves asymptotic variance no larger than that of the unadjusted baseline under a directional alignment condition. Notably, even if this condition fails to hold, the estimator remains asymptotically unbiased. Empirically, we demonstrate substantial gains in statistical efficiency and robustness through simulations and an application to multi-center neuroimaging data, while maintaining the internal validity of the original RCT.