The Long Arc of Care, Reimagined: Generative Cohorts That Preserve HIV Clinical Fidelity
Thursday, Aug 7: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session
Music City Center
Longitudinal patient records are invaluable for understanding the evolving needs and outcomes of individuals with chronic diseases, such as people living with HIV. Restrictions on data sharing due to privacy concerns are impeding efforts to address the global HIV epidemic. Recent advances in generative artificial intelligence (AI) create a promising opportunity to simulate realistic synthetic HIV cohort data to address privacy challenges.
However, generating high-quality longitudinal patient records still faces major challenges. Chronic disease cohorts, like those for HIV, span decades, include diverse data types, and focus on critical time-to-event outcomes. Current methods struggle to capture this temporal depth and clinical complexity, and most are evaluated on short horizons.
To address these challenges, we introduce MeLD (Medical Longitudinal Latent Diffusion), a latent diffusion-based synthetic longitudinal data generator built on a transformer backbone. We apply MeLD to generate the first large-scale, longitudinal synthetic HIV cohort data based on the Caribbean, Central and South America Network for HIV Epidemiology (CCASAnet), one of the major international HIV consortia, encompassing data from over 60K HIV patients.
Systematic evaluations suggest that MeLD produces synthetic data with higher quality than the state-of-the-art methods. Specifically, MeLD-generated data reproduce realistic survival curves spanning 40 years (Log-rank test p-value= 0.78[0.21]). The evaluation results of the reproducibility of baseline risk factors in a Cox-proportional hazard model suggest that 33 out of 40 hypothesis tests reach concordant conclusions. Standard privacy assessment indicates negligible risks for data sharing.
In conclusion, we develop a novel method for generating synthetic longitudinal cohort data and create the first large-scale, high-quality multi-country synthetic longitudinal HIV patient cohort that replicates the complexity and structure of the original CCASAnet data. This represents a significant step forward in promoting data sharing and facilitating HIV research and applications.
Authors: Zhuohui J. Liang, Zhuohang Li, Nicholas J. Jackson, Yanink Caro, Ronaldo Ismerio, Fabio Paredes, Amir Asiaee, Stephany N. Duda, Bradley A. Malin, Bryan E. Shepherd, and Chao Yan
You have unsaved changes.