Real World UQ Quality Assessment via Surrogate Oracle Modeling
Tuesday, Aug 6: 11:15 AM - 11:35 AM
Invited Paper Session
Oregon Convention Center
The gold standard in evaluating prediction interval quality given by data-driven classification models is to compute frequentist validity and efficiency statistics relative to a ground truth oracle distribution. The oracle distribution is required to determine if a predicted class probability interval contains the true class probability for accurate validity assessments. In real-world data, there is no 'true' oracle distribution which leads us to ask if a surrogate of the oracle model (SOM) be used in place of the oracle and the same metrics be computed? We investigate the feasibility of using SOMs in cases where an underlying data distribution is unavailable. Specifically, we use generative methods to learn a distribution over the data and ask the question if a SOM exists, and if so, what is the quality of the learned distribution? Can we use a SOM to rank UQ enabled models in lieu of the oracle model? Our experiments show that such a SOM indeed exists, and that it can be used as a tool to provide coverage and validity estimates that have small error to the true values compared to an oracle model. Effectively enabling real-world model selection based on UQ quality.
You have unsaved changes.