Generative AI for Enhancing Real-World Data Quality in Hybrid Controls
Tuesday, Aug 5: 11:25 AM - 11:50 AM
Invited Paper Session
Music City Center
High-quality Real-World Data (RWD) is essential for reliable analysis, yet challenges like missing data, ambiguity, and chronological misalignments frequently arise. In asthma and COPD research using Optum EHR claims data, RWD supports eligibility criteria refinement, power validation, and identification of key populations. However, reliance on complete cases for missing data can introduce selection bias. Traditional imputation methods, like mean and median imputation, are limited in addressing RWD's complexity. Advanced AI methods, such as autoencoders (AEs), variational autoencoders (VAEs), and GANs, offer robust solutions by capturing intricate data relationships. AEs and VAEs use latent spaces for data reconstruction, with VAEs enabling flexible learning of distributions. GANs further improve imputation by generating synthetic data to fill gaps. Beyond imputation, these generative AI models detect anomalies by comparing reconstructed and real data, while Bayesian networks identify low-likelihood records as errors, modeling conditional dependencies. With enhanced RWD, advanced analyses become feasible. Virtual Twins use machine learning and causal inference to pinpoint subgroups, Bayesian networks map data dependencies with transparency, and deep learning integrates unstructured data, refining clinical trial screening and design.
Bayesian networks
variational autoencoders (VAEs)
generative adversarial networks (GANs)
virtual twins
You have unsaved changes.