Thursday, Aug 7: 8:30 AM - 10:20 AM
0593
Topic-Contributed Paper Session
Music City Center
Room: CC-208B
Applied
Yes
Main Sponsor
ENAR
Co Sponsors
Section on Medical Devices and Diagnostics
Section on Statistical Learning and Data Science
Presentations
Effective and rapid decision-making in clinical trials requires unbiased and precise treatment effect inferences. Recent advances in artificial intelligence (AI) are poised to revolutionize breakthroughs in innovative trial design and analyses, improving efficiency in Phase 2 and 3. We present on AI-enabled methods that combine digital twins and traditional statistical frameworks to improve trial efficiency that satisfy regulatory guidance. Digital Twin Generators (DTG) are pre-trained generative models that generate digital twins for each trial participant using only baseline measurements and are fully prespecifiable. Digital twins are individualized, probabilistic distributions of a participant's disease progression and do not require any changes to trial conduct itself. Combined with traditional Frequentist and Bayesian frameworks, digital twins increase power and reduce sample size in trials without compromising Type 1 error control. Digital twins also improve internal decision making through personalized p-values for subgroup discovery and optimized composite scores. We present results of recent case studies and discuss prospective applications of these methodologies.
Keywords
AI-generated digital twins
Randomized controlled trials (RCTs) are considered the gold standard for evaluating the relative effectiveness of interventions. However, evidence from RCTs alone could be limited due to lack of generalizability, small/moderate sample size, and limited follow up time. Recently, several methods have been developed to integrate RCTs with real-world data (RWD). Some of existing data integration methods have not account for the potential biases introduced by observational data, due to lack of randomization, unmeasured confounding, missingness, and selection biases. In this talk, we present a novel strategy to leverage a rich set of negative control outcomes to safely calibrate the estimate that combines the evidence from RCT and RWD, while mitigate the impacts of the biases. This approach offers a promising solution for enhancing the validity and reproducibility of evidence generated by integrating RWD with RCTs.
Speaker
Jingyue Huang, Perelman School of Medicine at the University of Pennsylvania
Insufficient patient accrual is a major challenge in clinical trials and can result in underpowered studies, as well as exposing study participants to toxicity and additional costs, with limited scientific benefit. Insufficient accrual can occur due to running out of resources, or a study being stopped early for other reasons. We performed a retrospective analysis using ten datasets from nine fully accrued, completed and published cancer clinical trials. For each trial we simulated insufficient accrual and generated virtual patients to compensate for that. We then replicated the published analyses on this augmented dataset to determine if the findings are the same. Replication of the published analyses utilized four metrics: decision agreement, estimate agreement, standardized difference, and confidence interval overlap. Sequential synthesis performed well on the four replication metrics for the removal of up to 40% of the last recruited patients (decision agreement: 88% to 100% across datasets, estimate agreement: 100%, cannot reject standardized difference null hypothesis: 100%, and CI overlap: 0.8 to 0.92). There was no evidence of a monotonic relationship in the estimated effect size with recruitment order across these studies. This suggests that patients recruited earlier in a trial are not systematically different than those recruited later, at least partially explaining why generative models trained on early data can effectively simulate patients recruited later in a trial.
For an oncology study with insufficient accrual with as few as 60% of target recruitment, sequential synthesis can enable the simulation of the full dataset had the study continued accruing patients, and can be an alternative to drawing conclusions from an underpowered study or even abandoning the data. These results provide evidence demonstrating the potential for generative models to rescue poorly accruing clinical trials.
These results are limited to drug trials (surgery trials, for example, may demonstrate a learning effect over time), for oncology, and did not consider safety data. Furthermore, for small trials pre-trained generative models may provide a better alternative for simulating patients.
Longitudinal patient records are invaluable for understanding the evolving needs and outcomes of individuals with chronic diseases, such as people living with HIV. Restrictions on data sharing due to privacy concerns are impeding efforts to address the global HIV epidemic. Recent advances in generative artificial intelligence (AI) create a promising opportunity to simulate realistic synthetic HIV cohort data to address privacy challenges.
However, generating high-quality longitudinal patient records still faces major challenges. Chronic disease cohorts, like those for HIV, span decades, include diverse data types, and focus on critical time-to-event outcomes. Current methods struggle to capture this temporal depth and clinical complexity, and most are evaluated on short horizons.
To address these challenges, we introduce MeLD (Medical Longitudinal Latent Diffusion), a latent diffusion-based synthetic longitudinal data generator built on a transformer backbone. We apply MeLD to generate the first large-scale, longitudinal synthetic HIV cohort data based on the Caribbean, Central and South America Network for HIV Epidemiology (CCASAnet), one of the major international HIV consortia, encompassing data from over 60K HIV patients.
Systematic evaluations suggest that MeLD produces synthetic data with higher quality than the state-of-the-art methods. Specifically, MeLD-generated data reproduce realistic survival curves spanning 40 years (Log-rank test p-value= 0.78[0.21]). The evaluation results of the reproducibility of baseline risk factors in a Cox-proportional hazard model suggest that 33 out of 40 hypothesis tests reach concordant conclusions. Standard privacy assessment indicates negligible risks for data sharing.
In conclusion, we develop a novel method for generating synthetic longitudinal cohort data and create the first large-scale, high-quality multi-country synthetic longitudinal HIV patient cohort that replicates the complexity and structure of the original CCASAnet data. This represents a significant step forward in promoting data sharing and facilitating HIV research and applications.
Authors: Zhuohui J. Liang, Zhuohang Li, Nicholas J. Jackson, Yanink Caro, Ronaldo Ismerio, Fabio Paredes, Amir Asiaee, Stephany N. Duda, Bradley A. Malin, Bryan E. Shepherd, and Chao Yan