Harnessing AI/ML and Advanced Analytics in Clinical Development

Binbing Yu Chair
AstraZeneca
 
Monday, Aug 4: 2:00 PM - 3:50 PM
4071 
Contributed Papers 
Music City Center 
Room: CC-209B 

Main Sponsor

Biopharmaceutical Section

Presentations

Statistical Analysis Plan (SAP) Automation with Generative AI

A Statistical Analysis Plan (SAP) is typically written by statisticians and is based primarily on the study protocol, sometimes taking several weeks to finalize. We present our efforts creating tools harnessing generative AI to augment and automate SAP writing.

A key step in SAP writing involves parsing the protocol and mapping its information to the appropriate sections. This information may be copied, summarized, split or fused with additional sources. Protocols are ingested into a knowledge graph. Large Language Models (LLMs) play a role at various stages. A secure Pfizer API is used to query GPT-o1, while vector search and Retrieval-Augmented Generation (RAG) are used as well.

Early versions of our automation tools have been deployed for Non-Interventional, Clinical Pharmacology, and Interventional SAPs. Subject Matter Expert (SME) statisticians have played a pivotal role in development and evaluation of these tools. Our evaluation framework is focused on truthfulness, eloquence and reasoning, and consists of automated and human metrics. User feedback indicates that the automation has successfully increased efficiency. 

Keywords

Generative AI

Statistical Analysis Plan

Clinical Trials

Regulatory documents

Automation

Writing 

Co-Author(s)

Sheraz Khan
Giannis Manousaridis, Pfizer, inc.
Griffith Bell
Anna Plotka
Amy Lauren Ashworth, Pfizer, inc.
Donal Neilson Gorman, Pfizer, inc.
Feng Dai, Pfizer
Lili Jiang, Pfizer, inc.
Yi-Chien Lee, Pfizer, inc.
Gordon Siu, Pfizer, inc.
Alexandra Thiry, Pfizer, inc.
Subha Madhavan
Richard Zhang, Pfizer
Birol Emir

First Author

Rogier Landman

Presenting Author

Rogier Landman

Dynamic Text and Automatic Formatting – An R Package to Streamline Time-sensitive Formal Reports

Topline results (TLR) in clinical trials promptly disclose key trial findings to stakeholders following primary analyses. RapTLR enhances TLR generation using functions to manipulate .docx objects. In R, users can navigate the document, modify or create content, and embed interactive appendices, significantly boosting efficiency, accuracy, and objectivity. Incorporating in-text calculations in R ensures precision by eliminating errors from manual data entry, which is crucial during updates from database releases or changes in parameter specifications. RapTLR streamlines appendix construction by allowing users to automatically (1) convert outputs into .docx objects (2) add bookmarks to each output ID (3) append outputs to the report, and (4) create a table of contents that includes bookmarked links for smooth navigation. To ensure objectivity and timeliness in reporting, the program must be set up well before database lock so that results do not impact the pre-planned structure of the TLR. Additionally, multiple statisticians can run the same program and obtain the same results. RapTLR package saves statisticians time while improving quality in a critical phase of a clinical trial. 

Keywords

R

Clinical Trials

Pharmaceuticals

Automation

R Package

git 

Co-Author(s)

Caesar Li, Johnson & Johnson
Antoine Stos, Johnson & Johnson
Yannick Vandendijck, Johnson & Johnson
Surya Mohanty
Lauren Crow

First Author

Xiang Li, Johnson & Johnson

Presenting Author

Lauren Crow

Ensemble Models for Differential Analysis

Inspired by ensemble models in machine learning, we propose a general framework for aggregating multiple diverse base models to boost the power of published differential association analysis (DAA) methods. We demonstrate this approach by augmenting popular DAA models with one or more biologically motivated alternatives. This creates an ensemble that bypasses the challenge of selecting an optimal model but instead combines the strengths of complementary statistical models to achieve superior performance. Our proposed ensemble learning approach is platform-agnostic and can augment any existing DAA method, providing a general and flexible framework for various downstream modeling tasks across domains and data types. We performed extensive benchmarking across both simulated and experimental datasets from single-cell to bulk ribonucleic acid sequencing (RNA-Seq) to microbiome profiles, where the ensemble strategy vastly outperformed non-ensemble methods, identified more differential patterns than the competitors, and displayed good control of false positive and false discovery rates across diversified scenarios. https://github.com/himelmallick/DAssemble. 

Keywords

tweedie

differential expression

omics

data science 

Co-Author(s)

Erina Paul
Piyali Basak, Merck & Co.
Ziyu Liu, Cornell University
Jialin Gao, Cornell University
Arinjita Bhattacharyya, Merck & Co., Inc.
Chitrak Banerjee, Michigan State University
Himel Mallick, Cornell University

First Author

Suvo Chatterjee, Indiana University, Bloomington

Presenting Author

Arinjita Bhattacharyya, Merck & Co., Inc.

Innovative Data Analysis of Adverse Event in Predicting Clinical Outcomes in Cancer

While significant effort has been made in collecting adverse event (AE) in cancer clinical trial, utilization of AE data is limited and suboptimal. The key challenge is in complexity of AE data. When a patient experiences AEs, the event is not just Yes or No. Severity of the event does matter, as well as event duration. Moreover, attribution of the event to treatment relatedness is another factor for consideration. Furthermore, each unique AE likely occur only to few subjects, resulting in spareness issue. Because of high degree of difficulty, most AE reports in medical publication are descriptive simply based on proportion and frequency. We develop a novel data analysis strategy to decompose the multi-faceted AE for downstream analysis. The approach utilizes event severity and treatment relatedness to form multiple subgroups. Each subgroup assesses different metrics, such as occurrence, frequency, and duration to capture a diverse range of AE contents and to unlock the potential for clinical application. We demonstrate in a colorectal cancer (CRC) study the AE-derived metrics could identify subset of patients for treatment benefit and highlight the potential clinical utility. 

Keywords

adverse event

cancer

data analysis 

Co-Author(s)

Timothy Shaw, Moffitt Cancer Center
Wenyaw Chan, University of Texas-Houston
Yu-Kuan Pan, University of Texas Health Science Center in Houston

First Author

Dung-Tsa Chen, Moffitt Cancer Center

Presenting Author

Dung-Tsa Chen, Moffitt Cancer Center

Reflection on Recent Guidance and Applications of Incorporating AI into Drug Development

Artificial intelligence (AI) has demonstrated its potential in generating unique insights and substantially increasing productivity across different industries. Its usage in drug development and supporting regulatory decision-making also experience growth in recent years. In this talk, we review the recent released draft FDA guidance on leveraging AI in generating information or data to support regulatory decision-making regarding safety, effectiveness, or quality for drugs. Specifically, the guidance lays out a risk-based credibility assessment framework consisting seven steps with a focus on assessing the AI model risk, detailed plan to establish the credibility within the specific context of use. It also highlights the importance of maintaining the AI model relevant over time when new data becomes available. We then apply this framework to use cases of AI in drug development including the intended research questions, advantage of AI over traditional methodologies, fit-for-use data source of training, tuning, testing and performance assessment, factors contributing to the success and limitations. 

Keywords

artificial intelligence

guidance

use cases 

Co-Author

Weili He, AbbVie

First Author

Hongwei Wang, AbbVie

Presenting Author

Hongwei Wang, AbbVie