Print Close

Reproducible Extraction of Drug–Event Latency From Pharmacovigilance Narratives Using Large Language Models and Statistical Monitoring

Presented During: CS009 Lightning Session 1, Part 1

Conference: Symposium on Data Science and Statistics (SDSS) 2026

04/29/2026: 1:15 PM - 2:45 PM CDT
Lightning

Description

Accurate estimation of the time elapsed from exposure of a pharmaceutical product to the occurrence of an adverse event (i.e. Latency) is essential information for pharmacovigilance decision-making. In practice, latency information is often embedded within unstructured Individual Case Safety Report (ICSR) narratives and must be manually extracted, a process that is time-consuming, error-prone, and difficult to scale. We describe a hybrid statistical and generative AI framework designed to automate the extraction of dose dates, event onset dates, and supporting quotations from individual case narratives. The framework computes latency as either point estimates or bounded intervals, depending on the level of date precision available. The system leverages a large language model (OpenAI o3) to identify fully and partially specified dates and to extract explicit latency statements. Precise dates allow for direct calculation of latency and partial dates allow for the creation of bounded latency ranges. When latency is expressed narratively, targeted parsing methods derive corresponding numeric estimates. To mitigate LLM output variability, each extraction is repeated ten times, and results are aggregated using a modal consensus rule. Aggregation produces most likely (i.e. mode) final dates or intervals alongside a reproducibility score, defined as the proportion of runs supporting the modal outcome. The pipeline was evaluated on two hundred seventy-three real-world ICSRs with reviewer-annotated gold standards. For high-reproducibility cases (~90% reproducibility), the approach achieved ~89% agreement for first-dose latency and ~78% agreement for recent-dose latency, while reducing reviewer latency-extraction time by approximately 50%. Reproducibility demonstrated strong correlation with accuracy and served as a practical confidence indicator for prioritizing human review. These findings demonstrate that reproducible generative AI pipelines, integrated with statistical aggregation and sampling-based quality assurance, can reliably accelerate latency extraction from ICSRs while preserving transparency and human oversight. The provision of having accuracy and reproducibility estimates for the analyzed cases supports continuous oversight of the model's performance by an end-user in the loop. This approach enables more scalable, timely, and defensible pharmacovigilance workflows.

Keywords

Large language models (LLMs)

Generative AI

Pharmacovigilance and Safety Analytics

Latency Estimation

Reproducibility

Statistical Monitoring

Presenting Author

Swarnita Chakraborty, Johnson & Johnson

First Author

Swarnita Chakraborty, Johnson & Johnson

CoAuthor(s)

Geoffrey Gipson, Johnson & Johnson
Yauheniya Cherkas, Johnson & Johnson
Ricardo Vale de Andrade, Johnson & Johnson
Joao Barbosa, Johnson & Johnson
Zainab Aziz Zaveri, Johnson & Johnson
Hien Bui, Johnson & Johnson
Mark Oliver Amponin, Johnson & Johnson

Tracks

AI and LLM Applications

Symposium on Data Science and Statistics (SDSS) 2026