Advances in Inference and Theory for Bayesian Neural Networks

Natalie Klein Organizer
Los Alamos National Laboratory
 
Giosue Migliorini Organizer
University of California Irvine
 
Thursday, Aug 8: 8:30 AM - 10:20 AM
1213 
Invited Paper Session 
Oregon Convention Center 
Room: CC-255 
This session focuses on recent advances in inference and theory for Bayesian neural networks. Neural networks exhibit remarkable flexibility as parametric models and have recently been used to achieve impressive results in tasks as varied as realistic image generation, automation (e.g., self-driving cars), and natural language comprehension. As neural network models are increasingly applied to high-consequence areas such as medical and physical sciences, it is critical to better understand the behavior of neural network models and to interrogate the utility of prevailing approximate uncertainty quantification techniques.

A Bayesian treatment offers a principled approach to uncertainty quantification and model selection, and the ability of Bayesian models to capture epistemic uncertainty qualifies Bayesian inference as a lens on the generalization abilities of neural networks. Part of the talks in this session will investigate how such properties can be translated into algorithmic procedures for model selection, and explore the limitations of the marginal likelihood as a proxy for out-of-sample generalization.

Traditionally, Bayesian methods for neural networks involve establishing a prior distribution on the parameters. However, selecting informative priors is challenging, and the complex, high-dimensional structure of contemporary neural networks poses difficulties in posterior inference. Talks in this session will explore obstacles and promising directions in approximate posterior inference, including scalable algorithms, partially stochastic neural networks, and function space priors.

The invited speakers are leaders in the field of scalable inference methods for Bayesian neural networks and their work truly lies at the intersection of statistics and machine learning, with recent works appearing in high-profile machine learning venues. Thus, this session offers not only the opportunity for statisticians to learn about the latest advances in the Bayesian treatment of neural network models, but also the opportunity for leading machine learning researchers to connect more deeply with the statistics community.

Applied

No

Main Sponsor

Section on Physical and Engineering Sciences

Co Sponsors

IMS
Section on Bayesian Statistical Science
Section on Statistical Computing

Presentations

Bayesian Neural Model Selection for Symmetry Learning

Recent advancements in scalable Bayesian inference have enabled Bayesian model selection for deep neural networks. This statistical technique of optimizing the marginal likelihood embodies an Occam's razor effect, allowing neural network hyperparameters to be learned from training data. This enables automatically adapting neural architectures and differentiable learning of inductive biases from data. In this talk, we discuss how recent advancements in approximate inference techniques, such as the Laplace approximation and non-mean-field Variational Inference, can provide differentiable estimates of the marginal likelihood that scale to large models and datasets. We present promising examples demonstrating scalable Bayesian model selection to learn invariances, layer-wise equivariances, adapt neural architectures and inductive biases, and automatically discover conserved quantities and associated symmetries in physical systems. 

Speaker

Tycho van der Ouderaa

Functional Priors for Bayesian Deep Learning

The impressive success of Deep Learning (DL) in predictive performance tasks has fueled the hopes that this can help addressing societal challenges by supporting sound decision making. However, many open questions remain about their suitability to hold up to this promise. In this talk, I will discuss some of the current limitations of DL, which directly affect their wide adoption. I will focus in particular on the poor ability of DL models to quantify uncertainty in predictions, and I will present Bayesian DL as an attractive approach combining the flexibility of DL with probabilistic reasoning. I will then discuss the challenges associated with carrying out inference and specifying sensible priors for DL models. After presenting some recent contributions to address these problems, I will conclude by presenting some interesting emerging research trends and open problems. 

Speaker

Maurizio Filippone, EURECOM

Is Bayesian Model Selection Aligned with Model Generalization?

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. 

Speaker

Andrew Wilson, NYU

Scaling Up Bayesian Neural Networks with Neural Networks

Bayesian Neural Network (BNN) offers a more principled, robust, and interpretable framework for analyzing high-dimensional data. They address the typical challenges associated with conventional deep learning methods, such as data insatiability, ad-hoc nature, and susceptibility to overfitting. However, their implementation typically relies on Markov chain Monte Carlo (MCMC) methods that are characterized by their computational intensity and inefficiency in a high-dimensional space. To address this issue, we propose a calibration-Emulation-Sampling (CES) strategy to significantly enhance the computational efficiency of BNN. In this CES framework, during the initial calibration stage, we collect a small set of samples from the parameter space. These samples serve as training data for the emulator. Here, we employ a Deep Neural Network (DNN) emulator to approximate the forward mapping, i.e., the process that input data go through various layers to generate predictions. Using simulated and real data, we demonstrate that our proposed method improves computational efficiency of BNN, while maintaining similar performance in terms of accuracy and uncertainty quantification. 

Speaker

Babak Shahbaba, UCI

The Boons of Being Less Bayesian: a study of partially stochastic neural networks

Bayesian approaches have the potential to mitigate problems with neural networks (NNs) such as overconfidence and lack of robustness. However, computation is a major obstacle to performing high-fidelity posterior inference. In this talk, I will first present our research on scalable variational approximations based on subnetworks. Only a subset of the NN is given a Bayesian treatment, and we find this is enough to perform competitive uncertainty estimation. I will then go on to further justify subnetwork inference, not simply for its computational benefits, but from the theoretical insight that these NNs have as rich a posterior predictive distribution as fully-stochastic NNs. Moreover, across various inference schemes, we observe no empirical benefit to using fully stochastic NNs. I will close by questioning whether a fully-Bayesian treatment of NNs can ever have a benefit. 

Speaker

Eric Nalisnick, Johns Hopkins University

Wide mean-field Bayesian neural networks ignore the data

Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. We show that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the optimal mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily. 

Speaker

Beau Coker, Harvard University