Print Close

Generative AI for Enhancing Real-World Data Quality in Hybrid Controls

Presented During: Revolutionizing Drug Development: Harnessing Real-World Data in Hybrid Trial Designs

Margaret Gamalo Co-Author
Pfizer

Yuxi Zhao Co-Author
Pfizer

Abhishek Bhattacharjee Co-Author
FDA

Margaret Gamalo Speaker
Pfizer

Tuesday, Aug 5: 11:25 AM - 11:50 AM
Invited Paper Session

Music City Center

High-quality Real-World Data (RWD) is essential for reliable analysis, yet challenges like missing data, ambiguity, and chronological misalignments frequently arise. In asthma and COPD research using Optum EHR claims data, RWD supports eligibility criteria refinement, power validation, and identification of key populations. However, reliance on complete cases for missing data can introduce selection bias. Traditional imputation methods, like mean and median imputation, are limited in addressing RWD's complexity. Advanced AI methods, such as autoencoders (AEs), variational autoencoders (VAEs), and GANs, offer robust solutions by capturing intricate data relationships. AEs and VAEs use latent spaces for data reconstruction, with VAEs enabling flexible learning of distributions. GANs further improve imputation by generating synthetic data to fill gaps. Beyond imputation, these generative AI models detect anomalies by comparing reconstructed and real data, while Bayesian networks identify low-likelihood records as errors, modeling conditional dependencies. With enhanced RWD, advanced analyses become feasible. Virtual Twins use machine learning and causal inference to pinpoint subgroups, Bayesian networks map data dependencies with transparency, and deep learning integrates unstructured data, refining clinical trial screening and design.

Keywords

Bayesian networks

variational autoencoders (VAEs)

generative adversarial networks (GANs)

virtual twins