Statistics, Data Science, and AI for Complex Data in Reproducible Biomedical Research

Yunhui Qi Chair
 
Sebastien Haneuse Discussant
Harvard T.H. Chan School of Public Health
 
Li-Xuan Qin Organizer
Memorial Sloan Kettering Cancer Center
 
Yunhui Qi Organizer
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
0553 
Invited Paper Session 
Music City Center 
Room: CC-208A 

Applied

Yes

Main Sponsor

Biometrics Section

Co Sponsors

ENAR
Section on Statistical Learning and Data Science

Presentations

On the Importance of Statistical Thinking and Exploratory Data Analysis in Modern Biomedical Research

The 20th century's digital progress has significantly transformed scientific research. Biomedical research in particular has benefitted from new measurement technologies that make it possible to observe molecular entities we previously could not. A common feature of these new technologies is that they generate large and complex datasets. For example, instead of focusing on one gene at a time; now, we can examine all genes together, shifting the approach from testing specific hypotheses to exploring and discovering new insights. However, the complexity of these data demand statistical expertise to discern meaningful patterns from chance or subtle systematic error, underscoring the critical role of statistics in biomedical research. Unfortunately, biomedical education has not kept pace with the demand for data analysis skills. In this talk I will showcase various examples to illustrate the vital role of statistical analysis and effective data visualization in the realm of genomics. 

Speaker

Rafael Irizarry, Dana-Farber Cancer Institute

Reproducibility of Large Language Models for Rare Disease Diagnosis Using Unstructured Electronic Health Records

Large language models (LLMs) are rapidly becoming the cornerstone of natural language processing due to their advanced capability to process unstructured text. In the context of rare disease diagnosis, LLMs have the potential to support clinical decision-making by automatically generating differential diagnoses or extracting granular disease phenotypes from patients' clinical notes in electronic health records (EHRs). However, variability in LLM outputs can arise from multiple sources, including differences in pre-training data, the models' probabilistic nature, and parameter settings, potentially impacting the consistency and reliability of downstream decision-making. Despite the growing popularity of using LLMs in biomedical research, their reproducibility in the context of rare disease diagnosis remains underexplored. Therefore, this study aims to evaluate the reproducibility of foundational LLMs, including OpenAI's ChatGPT and Meta's Llama models, in analyzing unstructured clinical notes from EHRs for rare disease diagnosis. Results from this study can provide insight into the reproducibility and robustness of LLMs to inform their reliable application in rare disease research and clinical decision support.  

Keywords

Large language model

Artificial intelligence

Reproducibility 

Speaker

Cathy Shyr, Vanderbilt University Medical Center

AI-Driven Knowledge Graph Models for Drug Repurposing and Precision Medicine

Drug repurposing, the process of identifying new applications for existing approved drugs, offers a time- and cost-efficient approach to drug development. The explosive growth of biomedical data provides significant opportunities to advance drug repurposing and precision medicine. However, effectively integrating complex, heterogeneous data to uncover meaningful repurposing signals remains challenging. In this presentation, I will introduce our research group's work on AI-driven knowledge graph models, which systematically integrate genomic, phenotypic, pharmacological and patient data and leverage deep learning algorithms to identify candidate drugs for repurposing and personalized treatment strategies. Through case studies, I will illustrate the application of our approach in uncovering potential therapeutic signals and enabling personalized medicine. 

Speaker

Zhenxiang Gao