Monday, Aug 4: 2:00 PM - 3:50 PM
4071
Contributed Papers
Music City Center
Room: CC-209B
Main Sponsor
Biopharmaceutical Section
Presentations
A Statistical Analysis Plan (SAP) is typically written by statisticians and is based primarily on the study protocol, sometimes taking several weeks to finalize. We present our efforts creating tools harnessing generative AI to augment and automate SAP writing.
A key step in SAP writing involves parsing the protocol and mapping its information to the appropriate sections. This information may be copied, summarized, split or fused with additional sources. Protocols are ingested into a knowledge graph. Large Language Models (LLMs) play a role at various stages. A secure Pfizer API is used to query GPT-o1, while vector search and Retrieval-Augmented Generation (RAG) are used as well.
Early versions of our automation tools have been deployed for Non-Interventional, Clinical Pharmacology, and Interventional SAPs. Subject Matter Expert (SME) statisticians have played a pivotal role in development and evaluation of these tools. Our evaluation framework is focused on truthfulness, eloquence and reasoning, and consists of automated and human metrics. User feedback indicates that the automation has successfully increased efficiency.
Keywords
Generative AI
Statistical Analysis Plan
Clinical Trials
Regulatory documents
Automation
Writing
Co-Author(s)
Sheraz Khan
Giannis Manousaridis, Pfizer, inc.
Griffith Bell
Anna Plotka
Amy Lauren Ashworth, Pfizer, inc.
Donal Neilson Gorman, Pfizer, inc.
Feng Dai, Pfizer
Lili Jiang, Pfizer, inc.
Yi-Chien Lee, Pfizer, inc.
Gordon Siu, Pfizer, inc.
Alexandra Thiry, Pfizer, inc.
Subha Madhavan
Richard Zhang, Pfizer
Birol Emir
First Author
Rogier Landman
Presenting Author
Rogier Landman
Topline results (TLR) in clinical trials promptly disclose key trial findings to stakeholders following primary analyses. RapTLR enhances TLR generation using functions to manipulate .docx objects. In R, users can navigate the document, modify or create content, and embed interactive appendices, significantly boosting efficiency, accuracy, and objectivity. Incorporating in-text calculations in R ensures precision by eliminating errors from manual data entry, which is crucial during updates from database releases or changes in parameter specifications. RapTLR streamlines appendix construction by allowing users to automatically (1) convert outputs into .docx objects (2) add bookmarks to each output ID (3) append outputs to the report, and (4) create a table of contents that includes bookmarked links for smooth navigation. To ensure objectivity and timeliness in reporting, the program must be set up well before database lock so that results do not impact the pre-planned structure of the TLR. Additionally, multiple statisticians can run the same program and obtain the same results. RapTLR package saves statisticians time while improving quality in a critical phase of a clinical trial.
Keywords
R
Clinical Trials
Pharmaceuticals
Automation
R Package
git
Inspired by ensemble models in machine learning, we propose a general framework for aggregating multiple diverse base models to boost the power of published differential association analysis (DAA) methods. We demonstrate this approach by augmenting popular DAA models with one or more biologically motivated alternatives. This creates an ensemble that bypasses the challenge of selecting an optimal model but instead combines the strengths of complementary statistical models to achieve superior performance. Our proposed ensemble learning approach is platform-agnostic and can augment any existing DAA method, providing a general and flexible framework for various downstream modeling tasks across domains and data types. We performed extensive benchmarking across both simulated and experimental datasets from single-cell to bulk ribonucleic acid sequencing (RNA-Seq) to microbiome profiles, where the ensemble strategy vastly outperformed non-ensemble methods, identified more differential patterns than the competitors, and displayed good control of false positive and false discovery rates across diversified scenarios. https://github.com/himelmallick/DAssemble.
Keywords
tweedie
differential expression
omics
data science
Co-Author(s)
Erina Paul
Piyali Basak, Merck & Co.
Ziyu Liu, Cornell University
Jialin Gao, Cornell University
Arinjita Bhattacharyya, Merck & Co., Inc.
Chitrak Banerjee, Michigan State University
Himel Mallick, Cornell University
First Author
Suvo Chatterjee, Indiana University, Bloomington
Presenting Author
Arinjita Bhattacharyya, Merck & Co., Inc.
While significant effort has been made in collecting adverse event (AE) in cancer clinical trial, utilization of AE data is limited and suboptimal. The key challenge is in complexity of AE data. When a patient experiences AEs, the event is not just Yes or No. Severity of the event does matter, as well as event duration. Moreover, attribution of the event to treatment relatedness is another factor for consideration. Furthermore, each unique AE likely occur only to few subjects, resulting in spareness issue. Because of high degree of difficulty, most AE reports in medical publication are descriptive simply based on proportion and frequency. We develop a novel data analysis strategy to decompose the multi-faceted AE for downstream analysis. The approach utilizes event severity and treatment relatedness to form multiple subgroups. Each subgroup assesses different metrics, such as occurrence, frequency, and duration to capture a diverse range of AE contents and to unlock the potential for clinical application. We demonstrate in a colorectal cancer (CRC) study the AE-derived metrics could identify subset of patients for treatment benefit and highlight the potential clinical utility.
Keywords
adverse event
cancer
data analysis
Artificial intelligence (AI) has demonstrated its potential in generating unique insights and substantially increasing productivity across different industries. Its usage in drug development and supporting regulatory decision-making also experience growth in recent years. In this talk, we review the recent released draft FDA guidance on leveraging AI in generating information or data to support regulatory decision-making regarding safety, effectiveness, or quality for drugs. Specifically, the guidance lays out a risk-based credibility assessment framework consisting seven steps with a focus on assessing the AI model risk, detailed plan to establish the credibility within the specific context of use. It also highlights the importance of maintaining the AI model relevant over time when new data becomes available. We then apply this framework to use cases of AI in drug development including the intended research questions, advantage of AI over traditional methodologies, fit-for-use data source of training, tuning, testing and performance assessment, factors contributing to the success and limitations.
Keywords
artificial intelligence
guidance
use cases