Reproducibility of Large Language Models for Rare Disease Diagnosis Using Unstructured Electronic Health Records

Cathy Shyr Speaker
Vanderbilt University Medical Center
 
Tuesday, Aug 5: 2:30 PM - 2:55 PM
Invited Paper Session 
Music City Center 
Large language models (LLMs) are rapidly becoming the cornerstone of natural language processing due to their advanced capability to process unstructured text. In the context of rare disease diagnosis, LLMs have the potential to support clinical decision-making by automatically generating differential diagnoses or extracting granular disease phenotypes from patients' clinical notes in electronic health records (EHRs). However, variability in LLM outputs can arise from multiple sources, including differences in pre-training data, the models' probabilistic nature, and parameter settings, potentially impacting the consistency and reliability of downstream decision-making. Despite the growing popularity of using LLMs in biomedical research, their reproducibility in the context of rare disease diagnosis remains underexplored. Therefore, this study aims to evaluate the reproducibility of foundational LLMs, including OpenAI's ChatGPT and Meta's Llama models, in analyzing unstructured clinical notes from EHRs for rare disease diagnosis. Results from this study can provide insight into the reproducibility and robustness of LLMs to inform their reliable application in rare disease research and clinical decision support.

Keywords

Large language model

Artificial intelligence

Reproducibility