Automated Detection of Distant Metastasis in Lung Cancer from Longitudinal Reports Using State-of-the-Art Large Language Models

Summer Han Co-Author
Stanford University
 
Aparajita Khan Co-Author
Indian Institute of Technology Roorkee
 
Chloe Su Co-Author
 
Fatma Gunturkun Speaker
Stanford University
 
Wednesday, Aug 6: 3:25 PM - 3:45 PM
Topic-Contributed Paper Session 
Music City Center 
Timely and accurate identification of distant recurrence in non-small cell lung cancer (NSCLC) is critical for prognostic assessment and treatment optimization. Traditional methods relying on structured data often fail to capture the nuanced clinical details embedded in unstructured radiology and pathology reports. Recent advancements in large language models (LLMs) offer a promising approach for automating information extraction, enabling a more comprehensive and scalable analysis of recurrence patterns.

This study aims to systematically compare and evaluate the performance of multiple state-of-the-art LLMs—including GPT-4o, o1, DeepSeek-R1, LLaMA 3.3 (70B), Gemini 1.5 Pro, and MedLM-large—in detecting distant recurrence. The dataset comprises 30,161 radiology and pathology reports (collected between 2020 and 2022) from 2,116 lung cancer patients. A subset of 7,083 notes from 500 patients was manually annotated to establish a test dataset.

Zero-shot prompting was employed with standardized prompts across all models. A sample of errors was manually reviewed to identify common failure patterns. Fairness analysis was conducted to assess potential biases across demographic subgroups. The final model was deployed on 23,078 unannotated reports from additional lung cancer patients.

This study provides a structured framework for evaluating the performance of cutting-edge LLMs in clinical information extraction and underscores their potential to enhance the identification of distant recurrence in NSCLC.