Print Close

27: Impact of Performance Metrics in AI Model Evaluation

Presented During: SPAAC Poster Competition — Topic Contributed Poster Presentations

Victor Lo Co-Author
Fidelity Investments

Youssouf Salami First Author
Fidelity Investments

Youssouf Salami Presenting Author
Fidelity Investments

Monday, Aug 4: 2:00 PM - 3:50 PM
1936
Contributed Posters

Music City Center

The selection of a performance metric for the purpose of model evaluation is not as trivial as it may appear. On one hand, the model commissioners' expectations of the model's contribution to achieving their business objective (s) often lack empirical support. On the other, the model developers can easily be confused by the multitude of quantitative metrics recommended by the statistical literature. Hence the need for a methodology to guide the effective selection of statistical performance metric during model evaluation. In Salami et al (2024), we considered a fraud detection use case and we showed that F-beta (F_β, β>1) is more appropriate than F_1 or the Area Under the Precision Recall Curve (AUPRC) metric in measuring the model's contribution to the business objective. In this paper, we examine two facets of the F_β, namely the weighted F_β and the non-weighted F_β, and discuss how the selection of one in lieu of the other can lead to erroneous decisions with adverse impacts. As the use of AI algorithms becomes more prevalent in decision making, our paper brings a new perspective to the selection of statistical performance metrics for the purpose of evaluating AI models.

Keywords

artificial intelligence

machine learning

performance evaluation

performance metrics

model testing

model monitoring, performance thresholds

Main Sponsor

Business and Economic Statistics Section