A statistical perspective on uncertainty quantification for deep learning

Abstract Number:

1057 

Submission Type:

Invited Paper Session 

Participants:

Daniel Ries (1), Jason Adams (1), Natalie Klein (2), Trevor Harris (3), Feng Liang (4), Karl Pazdernik (5), Joshua Michalenko (1)

Institutions:

(1) Sandia National Laboratories, N/A, (2) Los Alamos National Laboratory, N/A, (3) Texas A&M University, N/A, (4) University of Illinois at Urbana-Champaign, N/A, (5) Pacific Northwest National Laboratory, N/A

Chair:

Jason Adams  
Sandia National Laboratories

Session Organizer:

Daniel Ries  
Sandia National Laboratories

Speaker(s):

Natalie Klein  
Los Alamos National Laboratory
Trevor Harris  
Texas A&M University
Feng Liang  
University of Illinois at Urbana-Champaign
Karl Pazdernik  
Pacific Northwest National Laboratory
Joshua Michalenko  
Sandia National Laboratories

Session Description:

Focus
Deep learning (DL) models are now ubiquitous in industry, academia, and government. However, these models provide no natural form of uncertainty quantification (UQ), rather only a point estimate. This poses a problem for application in high-consequence problems since a proper risk assessment requires an understanding of variability. This session includes several recent advances in the UQ for DL research domain including methodological developments, comparison and assessments of current approaches.
Content
Bayesian neural networks (BNN) have been at the forefront of UQ for deep learning models. We explore different inference methods and prior choices, and how these can influence the posterior predictive of a BNN, as well as model selection via Bayesian model evidence. We investigate connections with Gaussian Process models by the linearized Laplace approximation and apply these methods to the Mars Curiosity rover ChemCam spectral data set.
We explore using deep kernel learning to combine ensembles of climate model output into a single forecast and a new way to quantify the uncertainty of those predictions. For UQ, we introduce a conformal prediction approach that allows us to have spatially varying predictive variance without an explicit probabilistic model. The conformal ensembles we generate have improved coverage and lower continuous ranked probability scores (CRPS) versus the baseline Gaussian process predictive distribution.
Calibrating DL models is an important step because these large, highly parameterized models are known to drift, and retraining models can be expensive. We present a new calibration method for regression models to produce better model predictions, as measured by mean squared error, and better UQ, as measured by CRPS. We will also explore the calibration of deep learning architectures and the construction of credible intervals specifically for language models and natural language processing tasks.
There exist many UQ for DL approaches, but there has not been much work on evaluating and comparing these approaches. Furthermore, traditional metrics like empirical coverage are not possible on classification problems. We will discuss a generative approach to creating underlying "truth" probabilities for classification problems, in addition to a novel way of generating probability spaces for classification problems which can be used for UQ quality assessment.
Timeliness
The AI boom is here and not going anywhere soon. After the rollout of ChatGPT, the awareness of the general public of such models and their capabilities is higher than it has ever been. More decisions and analyses will be pushed to black box models, which will increase the demand for proper UQ, so users can understand where and when models are less confident. The field of UQ for DL is rapidly advancing from many domain areas, and a statistical perspective at JSM would provide a common meeting ground for those working on this area to exchange ideas.
Appeal
This session shows there are ample opportunities for statisticians to engage in research in uncertainty quantification and deep learning. The arena is heavily dominated by experts in machine learning, meaning there is room for statisticians to contribute with their probabilistic backgrounds in research and development of new methods. Additionally, this session includes applications useful for practitioners to see UQ used in practice.

Sponsors:

No Additional Sponsor 3
Journal on Uncertainty Quantification 3
Section on Statistical Learning and Data Science 2
Section on Statistics in Defense and National Security 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

No

Estimated Audience Size

Medium (80-150)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand