Advances in Loss-based, Penalized and Variational Bayesian Methods

Soham Ghosh Chair
University of Wisconsin, Madison
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4151 
Contributed Papers 
Music City Center 
Room: CC-103B 

Main Sponsor

Section on Bayesian Statistical Science

Presentations

A Bayesian decision-theoretic approach to sparse estimation

We extend the work of Hahn & Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method – which we call Bayesian Decoupling – employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes. 

Keywords

Decision theory

Loss function

Model selection

Penalized least squares

Sparse estimation

Tuning parameter selection 

Co-Author(s)

Surya Tokdar, Duke University
Jason Xu, Duke University

First Author

Aihua Li

Presenting Author

Aihua Li

A Class of Non-separable Penalty Functions for Bayesian Lasso-like Regression

Non-separable penalty functions are often used in regression modeling to enforce group sparsity structure, reduce the influence of unusual features, and improve estimation and prediction by providing a more realistic match between model and data. From a Bayesian perspective, such penalty functions correspond to a lack of (conditional) prior independence among the regression coefficients. We describe a class of prior distributions for regression coefficients that generates non-separable penalty functions. The priors have connections to L1-norm penalization and the Bayesian lasso (BL) and elastic net (BEN) regression models. The regularization properties of the class of priors can be understood both by studying its tunable parameters directly and via the connections to BL and BEN regression. We discuss full Bayesian inference under these priors and variable selection via Bayes factors and posterior model probabilities. Inference and prediction under the class of priors is shown to perform competitively under a range of example data structures. 

Keywords

Bayesian elastic net

Bayesian lasso

Penalized regression 

First Author

Christopher Hans, The Ohio State University

Presenting Author

Christopher Hans, The Ohio State University

Applying Multi-objective Bayesian optimization to Likelihood-Free inference

Scientific statistical models are often defined by generative processes for simulating synthetic data, but many, such as sequential sampling models (SSMs) used in psychology and consumer behavior, involve intractable likelihoods. Likelihood-free inference (LFI) methods address this challenge, enabling Bayesian parameter inference for such models. We propose to apply Multi-objective Bayesian Optimization (MOBO) to LFI for estimation of parameters using multi-source data, such as SSMs parameters using response times and choice outcomes. This approach models discrepancies for each data source separately, using MOBO to efficiently approximate the joint likelihood. This multivariate approach also identifies conflicting information from different data sources and provides insights on their different importance in estimation of individual parameters. We demonstrate the advantages of MOBO over single-discrepancy methods through a synthetic data example and a real-world application evaluating ride-hailing drivers' preferences for electric vehicle rentals in Singapore. While focused on SSMs, our method generalizes to likelihood-free calibration for other multi-source models. 

Keywords

Likelihood-Free Inference

Sequential Sampling Models

Multi-objective Bayesian Optimization 

Co-Author(s)

Xinwei Li, National University of Singapore
Eui-jin Kim, Ajou University
Prateek Bansal, National University of Singapore
David Nott, National University of Singapore

First Author

Zichuan Chen, National University of Singapore

Presenting Author

Zichuan Chen, National University of Singapore

Bayesian methods, meaningful parameters, and the importance of calibration

Parametric Bayesian models are specified by a prior distribution over the parameter. In simple models, the parameter vector is low-dimensional and the posterior concentrates around the "truth" at an appropriate rate--provided the model is exactly right for the data. However, the models behave differently when the stream of data arises from a distribution that lies outside the parametric family under consideration. In this case, analyses typically show mixed asymptotic performance: although the Bayes estimator may be consistent for the parameter of interest, Bayes estimators for nuisance parameters are inconsistent. As a consequence, credible intervals do not cover the parameter of interest at the nominal rate, even asymptotically. This phenomenon is well known for Bayesian versions of quantile regression, an important exemplar of the generalized Bayes technology.

This talk examines the phenomenon of miscalibration of misspecified models. We advocate the use of meaningful parameters, construct families of robust models that are indexed by these parameters, discuss the relationship between prior distribution and sensitivity analysis, and suggest methods for handling calibration. 

Keywords

Bayes

misspecified model

sensitivity analysis

generalized Bayes

robust model 

Co-Author

Hang Joon Kim, University of Cincinnati

First Author

Juhee Lee, UC Santa Cruz

Presenting Author

Steven MacEachern, The Ohio State University

Sample continuation in bayesian hierarchical model via variational inference

Posterior distributions in ill-posed Bayesian inverse problems are often analytically intractable and highly sensitive to prior assumptions. We study how a sample representation of the posterior evolves as prior parameters change, enabling sensitivity analysis for small perturbations and solution continuation for larger shifts. Our focus is on a class of non-conjugate hierarchical models that promote sparsity in linear inverse problems. These models, parameterized by a small set of shape parameters, encompass most classical sparsity-promoting priors. As parameters change, the posterior transitions from a tractable unimodal to an intractable multimodal distribution. To track these changes, we use Stein Variational Gradient Descent augmented with Birth-Death sampling, allowing efficient mass exchange between modes while optimizing kernel bandwidth. Our approach effectively samples multimodal posteriors and provides robust sensitivity analysis, as demonstrated in experimental results. 

Keywords

Bayesian Hierarchical model

Variational Inference

Distribution Evolution 

Co-Author(s)

Zilai Si, Northwestern University
Alexander Strang, University of California, Berkeley

First Author

Yucong Liu, Georgia Institute of Technology

Presenting Author

Yucong Liu, Georgia Institute of Technology

Loss-Based Bayesian Clustering for Big Data using Splinters

We propose a Bayesian method to cluster large datasets where obtaining samples from the full posterior distribution is impractical. In Bayesian inference, an estimator is chosen by introducing a loss function and reporting the Bayes rule that minimizes its posterior expectation. Except in trivially small cases, this expectation must be approximated, typically using posterior samples. However, standard algorithms scale poorly, making it difficult to fit models with tens of thousands of items. We address the "big data" setting, where posterior sampling is infeasible, by splitting the data into overlapping subsets of manageable size for existing MCMC algorithms. The model is fit to each subset independently, generating several sets of posterior samples. Our goal is to use these samples to estimate a partition that approximates the one minimizing the full model's posterior expectation. The subset size, number of subsets, and degree of overlap are key tuning parameters, which we explore. 

Keywords

Bayesian clustering

Decision theory

Variation of information loss

Binder loss

Big data 

Co-Author(s)

Garritt Page, Brigham Young University
Fernando Quintana, Pontificia Universidad Catolica De Chile

First Author

David Dahl, Brigham Young University

Presenting Author

David Dahl, Brigham Young University

Variational Bayes for Basis Selection in Functional Data Representation with Correlated Errors

Functional data analysis (FDA) has found extensive application across various fields, driven by the increasing recording of data continuously over a time interval or at several discrete points. FDA provides the statistical tools specifically designed for handling such data. Over the past decade, Variational Bayes (VB) algorithms have gained popularity in FDA, primarily due to their speed advantages over MCMC methods. This work proposes a VB algorithm for basis function selection for functional data representation while allowing for a complex error covariance structure. We assess and compare the effectiveness of our proposed VB algorithm with MCMC via simulations. We also apply our approach to a publicly available dataset. Our results show the accuracy in coefficient estimation and the efficacy of our VB algorithm to find the true set of basis functions. Notably, our proposed VB algorithm demonstrates a performance comparable to MCMC but with substantially reduced computational cost. 

Keywords

Bayesian inference

Functional data

Variational EM

Basis function selection

Correlated errors 

Co-Author(s)

Camila De Souza, University of Western Ontario
Pedro Henrique Toledo de Oliveira Sousa, Federal University of Parana

First Author

Ana Carolina da Cruz, University of Western Ontario

Presenting Author

Ana Carolina da Cruz, University of Western Ontario