Evolving Insights: The AUC/c-Index in Modern Prognostic Model Evaluation

Polyna Khudyakov Chair
Sage Therapeutics
 
Olga Demler Organizer
Brigham and Women's Hospital/Harvard Medical School Boston, USA; Swiss Federal Institute of Technology ETH Zurich, Switzerland
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0534 
Invited Paper Session 
Music City Center 
Room: CC-101A 
In the AI era, prognostic and diagnostic models must be adaptable and capable of performing well in diverse and dynamic settings. Traditional metrics like the AUC (Area Under the Receiver Operating CharacteristicsCurve), c-index, and calibration tools remain the predominant methods for model validation. These metrics, while useful for assessing model discrimination and calibration, are not designed to capture the complexity required to ensure models remain effective across different populations and clinical scenarios. To ensure that AI-driven models offer meaningful improvements in patient care, a comprehensive and clinically relevant discussion of utility of AUC and other statistics for model evaluation is essential. This session aims to odiscuss important aspects of model performance that are important to understand to better address AI-era needs specific for healthcare settings.

Keywords

AUC

c-index

prognostic model

diagnostic model

two-class classification models

AI 

Applied

Yes

Main Sponsor

Section on Risk Analysis

Co Sponsors

Biometrics Section
Society for Medical Decision Making

Presentations

#1 Old Principles, New Models: Revisiting the Foundations of Good Prediction

An important question for any model is ``how useful will it be when applied to other data sets?'' Before diving into any particular metric for this, it is important to step back and ask exactly what we mean by "useful". We will find that the answer to that question is very often problem dependent. For example assume a model M is focused on survival time of patients with a particular cancer. One possible use of M would be to predict which subjects have an expected survival of 6 months or less, for recommendation to supportive care; in this case "good" is a simple yes/no decision. A separate use might be assign subjects to a trial of a new adjuvant therapy, one which is not expected to have any influence on deaths in the first year. In this case the ability of M to discriminate within the first year is immaterial, but dividing 1-2, 2-4 and 4+ might be quite important.
The main message of this talk is thus "first stop and think". We will also discuss some variations on the concordance that are more applicable to different cases. 

Keywords

prediction

concordance

survival 

Co-Author

Terry Therneau, Mayo Clinic

Speaker

Terry Therneau, Mayo Clinic

#2 A Federated Registration System for Artificial Intelligence in Health

As artificial intelligence (AI) becomes increasingly integrated into healthcare, the need for robust registries and assurance laboratories is critical to ensure transparency, reliability, and continuous monitoring of model performance. This session will explore the ongoing lifecycle of AI validation and adaptation, emphasizing the role of standardized performance metrics such as AUC and c-index in assessing and maintaining model efficacy. We will discuss how structured frameworks for model monitoring can enhance clinical trust, support regulatory compliance, and drive iterative improvements in AI-driven healthcare solutions. 

Co-Author

Michael Pencina, Duke Univeristy-Clinical Research Institute

Speaker

Michael Pencina, Duke Univeristy-Clinical Research Institute

#3 Estimating the AUC with a Graphical Lasso Method for High-dimensional Biomarkers with LOD

This talk discusses the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods. 

Keywords

AUC

c-index

validation

range of predictor variables 

Co-Author

Larry Tang, University of Central Florida

Speaker

Larry Tang, University of Central Florida

#4 Overcoming Limitations of AUC in Evaluating Model Performance

The Area Under the Receiver Operating Characteristic Curve (AUC) is one of the most widely used metrics for assessing the discriminatory ability of risk prediction models. It possesses several desirable properties: it is a proper scoring rule and remains independent of event rates, making it a valuable tool in model evaluation. However, its limitations are often overlooked. This presentation will focus on key challenges associated with AUC, including pitfalls in using it for external validation and how study design choices can significantly influence its value. We aim to provide a more nuanced perspective on AUC's role in model assessment and guide best practices for its application in clinical research. 

Keywords

AUC

c-index

validation

two-class classification 

Speaker

Olga Demler, Brigham and Women's Hospital/Harvard Medical School Boston, USA; Swiss Federal Institute of Technology ETH Zurich, Switzerland