Recent developments and challenges in Artificial Intelligence and Machine Learning (AIML) and their impact on drug development

Richard Baumgartner Speaker
Merck Research Laboratories
 
Tuesday, Aug 5: 2:45 PM - 3:05 PM
Topic-Contributed Paper Session 
Music City Center 
Recent advances in AI and ML have shown significant potential to accelerate drug development. Two prominent areas include deep learning for computer vision and generative AI-based large language models (LLMs), which are fundamentally transforming the analysis of unstructured data from imaging and natural language. As these new AI/ML algorithms are integrated into data analysis pipelines, important practical statistical questions are emerging. For instance, in drug discovery, deep learning in digital pathology is demonstrating substantial potential in prediction tasks, such as disease staging in MASH (metabolic dysfunction-associated steatohepatitis) or quantifying histopathology biomarkers in oncology. Developing these predictive models requires high-quality ground truth (gold) labels, typically obtained from manual labeling by pathologists or other experts, highlighting the need for efficient label aggregation. Additionally, manual labeling is challenging, as it is labor-intensive and prone to disagreement among experts regarding the true labels. In this contribution, we will review and systematically elucidate various methods of label aggregation in terms of their performance characteristics (sensitivity/specificity, positive/negative predictive value). We will investigate commonly used rule-based methods such as majority or weighted voting or more reader performance-attuned approaches like the Dawid-Skene model. Moreover, we will discuss the impact of imperfect labels on predicted outcomes and how AI assistance in pathologists' ratings may be beneficial in creating improved gold labels.

Keywords

generative AI, deep learning and statistics

imperfect labeling

gold label estimation for imaging applications