Generative AI for Enhancing Real-World Data Quality in Hybrid Controls

Margaret Gamalo Co-Author
Pfizer
 
Margaret Gamalo Co-Author
Pfizer
 
Yuxi Zhao Co-Author
Pfizer
 
Abhishek Bhattacharjee Co-Author
FDA
 
Margaret Gamalo Speaker
Pfizer
 
Tuesday, Aug 5: 11:25 AM - 11:50 AM
Invited Paper Session 
Music City Center 
High-quality Real-World Data (RWD) is essential for reliable analysis, yet challenges like missing data, ambiguity, and chronological misalignments frequently arise. In asthma and COPD research using Optum EHR claims data, RWD supports eligibility criteria refinement, power validation, and identification of key populations. However, reliance on complete cases for missing data can introduce selection bias. Traditional imputation methods, like mean and median imputation, are limited in addressing RWD's complexity. Advanced AI methods, such as autoencoders (AEs), variational autoencoders (VAEs), and GANs, offer robust solutions by capturing intricate data relationships. AEs and VAEs use latent spaces for data reconstruction, with VAEs enabling flexible learning of distributions. GANs further improve imputation by generating synthetic data to fill gaps. Beyond imputation, these generative AI models detect anomalies by comparing reconstructed and real data, while Bayesian networks identify low-likelihood records as errors, modeling conditional dependencies. With enhanced RWD, advanced analyses become feasible. Virtual Twins use machine learning and causal inference to pinpoint subgroups, Bayesian networks map data dependencies with transparency, and deep learning integrates unstructured data, refining clinical trial screening and design.

Keywords

Bayesian networks

variational autoencoders (VAEs)

generative adversarial networks (GANs)

virtual twins