Print Close

Building Unbiased Risk Prediction Models: A Matching Approach with Biased Electronic Health Record Data

Presented During: From reactive to proactive medicine: Statisticians’ evolving roles in the learning health system

Jinbo Chen Co-Author
University of Pennsylvania

Le Wang Speaker

Wednesday, Aug 6: 8:55 AM - 9:15 AM
Invited Paper Session

Music City Center

Risk prediction models using Electronic Health Record (EHR) data can identify high-risk patients early, enabling timely, personalized care and better resource allocation. Because EHR data are collected primarily for clinical rather than research purposes, they often lack crucial predictors for accurate risk prediction. To improve risk assessment, it is beneficial to incorporate risk predictors from external sources. To evaluate the added value of external predictors, we compare prediction accuracy metrics between models using only EHR predictors and those including both EHR and external predictors. Yet biased evaluation may occur if the source data do not represent the target population. To address this issue, a semiparametric method was developed that assumes the availability of a base model generating unbiased risk estimates in the target population using only EHR predictors. It ensures calibration of the enriched model by using risk estimates from the base model as constraints during likelihood maximization. However, prediction accuracy of the resultant model depends on the extent to which the sample data deviate from the target population. To this end, we developed a matching approach that selects a subset of the external source data that resembles the target population. We then apply the semiparametric method to the matched sample to derive the risk prediction model. While maintaining calibration, our method achieves greater prediction accuracy for evaluation of the external predictors. We assessed our proposed method via simulations and an application to a breast cancer study using Penn Biobank data.