Sources of Prediction Instability in Statistical & Machine Learning Models
Jeffrey Blume
Co-Author
University of Virginia, School of Data Science
Thursday, Aug 7: 9:50 AM - 10:05 AM
1889
Contributed Papers
Music City Center
The emergence of overparameterized models–where the number of parameters far exceeds the available sample size used to train the model–has been accompanied by a near-exclusive focus on model summaries of prediction accuracy. Consequentially, the variance and stability of individual-level predictions are often overlooked. While overparameterization provides flexibility, it incurs significant costs: greater variance and prediction instability. We compare the performance of statistical and machine learning models by refitting models under varying circumstances to gauge their stability. We find that instability is propagated through fitting routines, optimization targets, model architectures, the effective degrees of freedom and other design choices. Prediction instability is more pervasive than previously recognized, particularly when machine learning algorithms are applied in data-deficient situations. Analysts should not assume that individual-level prediction performance is stable when models are retrained and/or achieve near equivalent loss-optimality. Our study underscores the importance of assessing and minimizing the prediction stability before putting a model into production.
prediction
stability
machine-learning
variance
uncertainty
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.