Tuesday, Aug 5: 2:00 PM - 3:50 PM
0608
Topic-Contributed Paper Session
Music City Center
Room: CC-Davidson Ballroom B
Clinical Trials
Applied
Yes
Main Sponsor
Biopharmaceutical Section
Co Sponsors
Biometrics Section
International Chinese Statistical Association
Presentations
This presentation introduces the basic concepts and fundamental methods of causal inference relevant to pharmaceutical statistics. Starting with the central questions in drug development and licensing and the roles of causal inference and AI/ML in answering them, the presentation consists of three parts: (1) estimand framework, (2) efficient estimators, and (3) targeted learning. The presentation covers causal thinking for different types of commonly used study designs in the pharmaceutical industry, including but not limited to randomized controlled clinical trials, single-arm clinical trials with external controls, and real-world evidence studies. The materials covered in this presentation are extracted from the presenter's book, Causal Inference in Pharmaceutical Statistics, published by Chapman & Hall/CRC in 2024.
Keywords
Clinical Trials
Estimand
Efficiency
Machine Learning
The increasingly competitive landscape of drug development is motivating accelerated timelines and more efficient processes. Simultaneously, the rapid evolution of Artificial Intelligence (AI) has revolutionized the development landscape, thus presenting emergent opportunities for increasing development efficiency. In particular, new causal inference methodologies such as TMLE, DML, and GRF leverage AI/ML capabilities while preserving statistical inference of target estimands, which is crucial in both clinical reporting and internal decision-making. These methods are anticipated to help accelerate study timeline and conserve resource as well as aid in internal decision-making. We will discuss some of these industry trends as well as Sanofi efforts to fill identified gaps. In particular, we present efforts to industrialize processes for utilization as well as indication-specific simulation and evaluation of candidate methodologies.
Our principal message is that for practical use in clinical trials, causal ML methods can be evaluated and compared only in the context of a specific application and only taking into consideration the anticipated multivariate distribution for the target indication and population.
Keywords
Causal Machine Learning
Simulation
Pharmaceutical development
Recent advances in AI and ML have shown significant potential to accelerate drug development. Two prominent areas include deep learning for computer vision and generative AI-based large language models (LLMs), which are fundamentally transforming the analysis of unstructured data from imaging and natural language. As these new AI/ML algorithms are integrated into data analysis pipelines, important practical statistical questions are emerging. For instance, in drug discovery, deep learning in digital pathology is demonstrating substantial potential in prediction tasks, such as disease staging in MASH (metabolic dysfunction-associated steatohepatitis) or quantifying histopathology biomarkers in oncology. Developing these predictive models requires high-quality ground truth (gold) labels, typically obtained from manual labeling by pathologists or other experts, highlighting the need for efficient label aggregation. Additionally, manual labeling is challenging, as it is labor-intensive and prone to disagreement among experts regarding the true labels. In this contribution, we will review and systematically elucidate various methods of label aggregation in terms of their performance characteristics (sensitivity/specificity, positive/negative predictive value). We will investigate commonly used rule-based methods such as majority or weighted voting or more reader performance-attuned approaches like the Dawid-Skene model. Moreover, we will discuss the impact of imperfect labels on predicted outcomes and how AI assistance in pathologists' ratings may be beneficial in creating improved gold labels.
Keywords
generative AI, deep learning and statistics
imperfect labeling
gold label estimation for imaging applications