From reactive to proactive medicine: Statisticians’ evolving roles in the learning health system

Gustavo Amorim Chair
Vanderbilt University Medical Center
 
Bhramar Mukherjee Discussant
University of Michigan
 
Sarah Lotspeich Organizer
Wake Forest University
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
0184 
Invited Paper Session 
Music City Center 
Room: CC-102A 

Applied

Yes

Main Sponsor

ENAR

Co Sponsors

Biometrics Section
Caucus for Women in Statistics

Presentations

Considering the Totality of Evidence: Integrating EHR and Primary Data Collection to Improve Medical Decision Making

Randomized clinical trials (RCTs) are considered the gold standard for evaluating treatment efficacy but suffer from limitations. Vulnerable populations are under-represented in RCTs, raising concerns about equity and the external validity of results. Additionally, clinical trials may not reflect care and outcomes as they are experienced in routine practice and can be slow to provide timely evidence in the face of rapidly evolving public health crises. Electronic health records (EHR) have the potential to address many of these concerns. However, limitations of EHR necessitate careful attention to study design and statistical methods. Informed by my experience collaborating with oncologists to advance understanding of cancer treatment utilization and effectiveness, I will discuss the potential, and methodological challenges, of using EHR data to supplement RCTs by providing both more timely and generalizable evidence on treatment efficacy.  

Keywords

electronic health records

RCT

missing data

measurement error 

Speaker

Rebecca Hubbard, Brown University

Building Unbiased Risk Prediction Models: A Matching Approach with Biased Electronic Health Record Data

Risk prediction models using Electronic Health Record (EHR) data can identify high-risk patients early, enabling timely, personalized care and better resource allocation. Because EHR data are collected primarily for clinical rather than research purposes, they often lack crucial predictors for accurate risk prediction. To improve risk assessment, it is beneficial to incorporate risk predictors from external sources. To evaluate the added value of external predictors, we compare prediction accuracy metrics between models using only EHR predictors and those including both EHR and external predictors. Yet biased evaluation may occur if the source data do not represent the target population. To address this issue, a semiparametric method was developed that assumes the availability of a base model generating unbiased risk estimates in the target population using only EHR predictors. It ensures calibration of the enriched model by using risk estimates from the base model as constraints during likelihood maximization. However, prediction accuracy of the resultant model depends on the extent to which the sample data deviate from the target population. To this end, we developed a matching approach that selects a subset of the external source data that resembles the target population. We then apply the semiparametric method to the matched sample to derive the risk prediction model. While maintaining calibration, our method achieves greater prediction accuracy for evaluation of the external predictors. We assessed our proposed method via simulations and an application to a breast cancer study using Penn Biobank data.  

Co-Author

Jinbo Chen, University of Pennsylvania

Speaker

Le Wang

Assessing treatment effects in observational data with missing or mismeasured confounders: A comparative study of practical doubly-robust and traditional missing data methods

For safety and rare outcome studies in pharmacoepidemiology, multiple, large databases are often merged to improve statistical power and create a more generalizable cohort. Medical claims data have become a mainstay in evaluating the safety and effectiveness of medications post-approval, but confounders derived from administrative data can be prone to measurement error. Electronic health records (EHR) data or data abstracted from chart review have more granular patient data than do medical claims, but the gold standard exposure data may only be available on a subset. I will discuss two practical-to-implement doubly-robust estimators for this setting, one relying on a type of survey calibration and another utilizing targeted maximum likelihood estimation (TMLE), and compare their performance with that of more traditional missing data methods in a detailed numerical study. Numerical work includes plasmode simulation studies that emulate the complex data structure of a real large electronic health records cohort in order to compare anti-depressant therapies in a setting where a key confounder is prone to missingness.  

Keywords

doubly-robust methods

missing data

electronic health records

targeted maximum likelihood estimation

generalized raking

survey calibration 

Co-Author(s)

Brian Williamson, Kaiser Permanente Washington Health Research Institute
Chloe Krakauer
Eric Johnson, Kaiser Permanente Washington Health Research Institute
Susan Gruber, TL revolution, LLC
Bryan Shepherd, Vanderbilt University, School of Medicine
Mark Van Der Laan, UC Berkeley
Thomas Lumley, University of Auckland
Hana Lee, Food and Drug Administration
José Hernández-Muñoz, Food and Drug Administration
Fengyu Zhao, FDA, CDER
Sarah Dutcher, Food and Drug Administration
Rishi Desai, Brigham and Women’s Hospital, Harvard Medical School
Gregory Simon, Kaiser Permanente Washington Health Research Institute
Susan M Shortreed, Kaiser Permanente Washington Health Research Institute
Jennifer Nelson, Kaiser Permanente Health Research Institute

Speaker

Pamela Shaw, Kaiser Permanente Washington Health Research Institute

Targeted partial validation to make EHR data au-dit they can be: Correcting for data quality issues in the learning health system

The allostatic load index (ALI) is an informative summary of whole-person health that is predictive of downstream health outcomes. The ALI uses biomarker data to measure cumulative stress on five systems in the body for the general adult population. Borrowing data from electronic health records (EHR) is a promising opportunity to estimate the ALI and potentially identify at-risk patients on a large scale. However, routinely collected EHR data may contain missingness and errors, and ignoring these data quality issues could lead to biased statistical results and incorrect clinical decisions. Validation of EHR data (e.g., through chart reviews) can provide better-quality data, but realistically only a subset of patients' data can be validated. Thus, we devise a targeted study design ("targeted audit") to harness the error-prone surrogates from the EHR to identify the most informative patient records for validation. Specifically, the targeted audit design seeks the best statistical precision to quantify the association between ALI and healthcare utilization in logistic regression. In this talk, we detail the process of the targeted audit design and its application to EHR data from Atrium Health Wake Forest Baptist. 

Speaker

Sarah Lotspeich, Wake Forest University