Monday, Aug 4: 8:30 AM - 10:20 AM
0534
Invited Paper Session
Music City Center
Room: CC-101A
In the AI era, prognostic and diagnostic models must be adaptable and capable of performing well in diverse and dynamic settings. Traditional metrics like the AUC (Area Under the Receiver Operating CharacteristicsCurve), c-index, and calibration tools remain the predominant methods for model validation. These metrics, while useful for assessing model discrimination and calibration, are not designed to capture the complexity required to ensure models remain effective across different populations and clinical scenarios. To ensure that AI-driven models offer meaningful improvements in patient care, a comprehensive and clinically relevant discussion of utility of AUC and other statistics for model evaluation is essential. This session aims to odiscuss important aspects of model performance that are important to understand to better address AI-era needs specific for healthcare settings.
AUC
c-index
prognostic model
diagnostic model
two-class classification models
AI
Applied
Yes
Main Sponsor
Section on Risk Analysis
Co Sponsors
Biometrics Section
Society for Medical Decision Making
Presentations
An important question for any model is ``how useful will it be when applied to other data sets?'' Before diving into any particular metric for this, it is important to step back and ask exactly what we mean by "useful". We will find that the answer to that question is very often problem dependent. For example assume a model M is focused on survival time of patients with a particular cancer. One possible use of M would be to predict which subjects have an expected survival of 6 months or less, for recommendation to supportive care; in this case "good" is a simple yes/no decision. A separate use might be assign subjects to a trial of a new adjuvant therapy, one which is not expected to have any influence on deaths in the first year. In this case the ability of M to discriminate within the first year is immaterial, but dividing 1-2, 2-4 and 4+ might be quite important.
The main message of this talk is thus "first stop and think". We will also discuss some variations on the concordance that are more applicable to different cases.
Keywords
prediction
concordance
survival
As artificial intelligence (AI) becomes increasingly integrated into healthcare, the need for robust registries and assurance laboratories is critical to ensure transparency, reliability, and continuous monitoring of model performance. This session will explore the ongoing lifecycle of AI validation and adaptation, emphasizing the role of standardized performance metrics such as AUC and c-index in assessing and maintaining model efficacy. We will discuss how structured frameworks for model monitoring can enhance clinical trust, support regulatory compliance, and drive iterative improvements in AI-driven healthcare solutions.
This talk discusses the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods.
Keywords
AUC
c-index
validation
range of predictor variables
Co-Author
Larry Tang, University of Central Florida
Speaker
Larry Tang, University of Central Florida
The Area Under the Receiver Operating Characteristic Curve (AUC) is one of the most widely used metrics for assessing the discriminatory ability of risk prediction models. It possesses several desirable properties: it is a proper scoring rule and remains independent of event rates, making it a valuable tool in model evaluation. However, its limitations are often overlooked. This presentation will focus on key challenges associated with AUC, including pitfalls in using it for external validation and how study design choices can significantly influence its value. We aim to provide a more nuanced perspective on AUC's role in model assessment and guide best practices for its application in clinical research.
Keywords
AUC
c-index
validation
two-class classification
Speaker
Olga Demler, Brigham and Women's Hospital/Harvard Medical School Boston, USA; Swiss Federal Institute of Technology ETH Zurich, Switzerland