27: Impact of Performance Metrics in AI Model Evaluation
Monday, Aug 4: 2:00 PM - 3:50 PM
1936
Contributed Posters
Music City Center
The selection of a performance metric for the purpose of model evaluation is not as trivial as it may appear. On one hand, the model commissioners' expectations of the model's contribution to achieving their business objective (s) often lack empirical support. On the other, the model developers can easily be confused by the multitude of quantitative metrics recommended by the statistical literature. Hence the need for a methodology to guide the effective selection of statistical performance metric during model evaluation. In Salami et al (2024), we considered a fraud detection use case and we showed that F-beta (F_β, β>1) is more appropriate than F_1 or the Area Under the Precision Recall Curve (AUPRC) metric in measuring the model's contribution to the business objective. In this paper, we examine two facets of the F_β, namely the weighted F_β and the non-weighted F_β, and discuss how the selection of one in lieu of the other can lead to erroneous decisions with adverse impacts. As the use of AI algorithms becomes more prevalent in decision making, our paper brings a new perspective to the selection of statistical performance metrics for the purpose of evaluating AI models.
artificial intelligence
machine learning
performance evaluation
performance metrics
model testing
model monitoring, performance thresholds
Main Sponsor
Business and Economic Statistics Section
You have unsaved changes.