Navigating Electronic Health Record Data for Deeper Insights: Case Studies and Statistical Solutions

Zheyu Wang Chair
Johns Hopkins University
 
Zheyu Wang Organizer
Johns Hopkins University
 
Thursday, Aug 8: 8:30 AM - 10:20 AM
1281 
Invited Paper Session 
Oregon Convention Center 
Room: CC-D137 

Applied

Yes

Main Sponsor

Biometrics Section

Co Sponsors

Caucus for Women in Statistics
Health Policy Statistics Section
Society for Medical Decision Making

Presentations

Bootstrapping-based approaches to estimate the frequency, duration and risk factors for diagnostic delays

We develop a bootstrapping-based approach to estimate a range of statistical measures of diagnostic delays and missed diagnostic opportunities from large administrative data sources or electronic health records. Our approach utilizes the observed and expected patterns of healthcare visits with signs and symptoms of a disease during the period prior to the initial diagnosis where diagnostic delays are expected to occur. In order to account for uncertainty in identifying a diagnostic opportunity versus coincidental symptoms, resampling is used to randomly select which individual visits represent a diagnostic delay in a given trial. We describe three different resampling algorithms that can be used to select which healthcare visits represent a diagnostic delay based on the specific clinical characteristics of a given disease.

Using this approach, we estimate individual-level metrics that summarize the frequency and duration of diagnostic delays. We also use this procedure to model patient and healthcare setting risk factors for diagnostic delay. We apply our approach to a wide range of infectious and non-infectious diseases and summarize how each of our proposed algorithms impact t 

Speaker

Aaron Miller, University of Iowa

PresentationHarmonizing Electronic Health Record and Claims Data Across FDA Sentinel Initiative Data Partners: Case Study and Lessons Learned

The US Food and Drug Administration (FDA) Sentinel Initiative is a national surveillance system with a distributed data network of electronic health records (EHR) and claims data on >100 million patient lives from 17 data partners to monitor the safety of FDA-regulated medical products. The Sentinel System uses the Sentinel Common Data Model to standardize data elements and unify the medical coding "vocabulary" across participating sites. However, the coding "dialect" (i.e., the use and interpretation of codes) may still differ due to heterogeneity in care practice and financial drivers. With increasingly diverse data partners and medical coding systems, there is more and more variation in the way a clinical concept can be coded. Existing manually curated medical code ontology and mapping are not scalable and are error-prone. Data sharing constraints bring additional challenges. In this talk, we present data-driven and privacy-preserving statistical methods for detecting and reducing coding differences between healthcare systems. We share our findings from a case study of data harmonization between two Sentinel data partners among a diabetic population. 

Speaker

Xu Shi

Quantifying Misdiagnosis-related harm leveraging health record data and through mixture-model-based novel measures

Investigating and monitoring misdiagnosis-related harm is crucial for improving health care. However, this effort has traditionally focused on the chart review process, which is labor intensive, potentially unstable, and does not scale well. To monitor medical institutes' diagnostic performance and identify areas for improvement in a timely fashion, researchers proposed to leverage the relationship between symptoms and diseases based on electronic health records or claim data. Specifically, the elevated disease risk following a false-negative diagnosis can be used to signal potential harm. We proposed a mixture regression model and related harm measures and profiling analysis procedures to quantify, evaluate, and compare misdiagnosis-related harm across institutes with potentially different patient population compositions. We studied the performance of the proposed methods through simulation studies. We then illustrated the methods through data analyses on stroke occurrence data from the Taiwan Longitudinal Health Insurance Database. From the analyses, we quantitatively evaluated risk factors for being harmed due to misdiagnosis, which unveiled some insights for health care quality 

Speaker

Yuxin Zhu, Johns Hopkins University

Statistical challenges in development of equitable approaches to risk-guided cancer screening using EHR data

Personalized medicine holds the promise of improving population health by targeting interventions to those individuals most likely to experience benefits. Risk-guided medical practice uses prediction models to identify individuals at high probability of experiencing an outcome of interest or high probability of benefiting from additional medical intervention to support personalized decision making. However, when these risk models are constructed using data from electronic health records, medically-underserved populations and historically marginalized populations may not experience the anticipated benefits of personalization due to underrepresentation in risk-model development data and poorer quality data. These challenges can lead to poorer risk model performance and perpetuation of historical inequities in health outcomes. In this talk we explore statistical challenges in risk-guided cancer screening arising due to selection bias and differential outcome ascertainment in underrepresented race and ethnicity groups using the example of risk-targeted breast cancer screening and propose alternative approaches to minimize these biases. 

Speaker

Rebecca Hubbard, Brown University