Wednesday, Aug 6: 10:30 AM - 12:20 PM
4151
Contributed Papers
Music City Center
Room: CC-103B
Main Sponsor
Section on Bayesian Statistical Science
Presentations
We extend the work of Hahn & Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method – which we call Bayesian Decoupling – employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes.
Keywords
Decision theory
Loss function
Model selection
Penalized least squares
Sparse estimation
Tuning parameter selection
Non-separable penalty functions are often used in regression modeling to enforce group sparsity structure, reduce the influence of unusual features, and improve estimation and prediction by providing a more realistic match between model and data. From a Bayesian perspective, such penalty functions correspond to a lack of (conditional) prior independence among the regression coefficients. We describe a class of prior distributions for regression coefficients that generates non-separable penalty functions. The priors have connections to L1-norm penalization and the Bayesian lasso (BL) and elastic net (BEN) regression models. The regularization properties of the class of priors can be understood both by studying its tunable parameters directly and via the connections to BL and BEN regression. We discuss full Bayesian inference under these priors and variable selection via Bayes factors and posterior model probabilities. Inference and prediction under the class of priors is shown to perform competitively under a range of example data structures.
Keywords
Bayesian elastic net
Bayesian lasso
Penalized regression
Scientific statistical models are often defined by generative processes for simulating synthetic data, but many, such as sequential sampling models (SSMs) used in psychology and consumer behavior, involve intractable likelihoods. Likelihood-free inference (LFI) methods address this challenge, enabling Bayesian parameter inference for such models. We propose to apply Multi-objective Bayesian Optimization (MOBO) to LFI for estimation of parameters using multi-source data, such as SSMs parameters using response times and choice outcomes. This approach models discrepancies for each data source separately, using MOBO to efficiently approximate the joint likelihood. This multivariate approach also identifies conflicting information from different data sources and provides insights on their different importance in estimation of individual parameters. We demonstrate the advantages of MOBO over single-discrepancy methods through a synthetic data example and a real-world application evaluating ride-hailing drivers' preferences for electric vehicle rentals in Singapore. While focused on SSMs, our method generalizes to likelihood-free calibration for other multi-source models.
Keywords
Likelihood-Free Inference
Sequential Sampling Models
Multi-objective Bayesian Optimization
Parametric Bayesian models are specified by a prior distribution over the parameter. In simple models, the parameter vector is low-dimensional and the posterior concentrates around the "truth" at an appropriate rate--provided the model is exactly right for the data. However, the models behave differently when the stream of data arises from a distribution that lies outside the parametric family under consideration. In this case, analyses typically show mixed asymptotic performance: although the Bayes estimator may be consistent for the parameter of interest, Bayes estimators for nuisance parameters are inconsistent. As a consequence, credible intervals do not cover the parameter of interest at the nominal rate, even asymptotically. This phenomenon is well known for Bayesian versions of quantile regression, an important exemplar of the generalized Bayes technology.
This talk examines the phenomenon of miscalibration of misspecified models. We advocate the use of meaningful parameters, construct families of robust models that are indexed by these parameters, discuss the relationship between prior distribution and sensitivity analysis, and suggest methods for handling calibration.
Keywords
Bayes
misspecified model
sensitivity analysis
generalized Bayes
robust model
Posterior distributions in ill-posed Bayesian inverse problems are often analytically intractable and highly sensitive to prior assumptions. We study how a sample representation of the posterior evolves as prior parameters change, enabling sensitivity analysis for small perturbations and solution continuation for larger shifts. Our focus is on a class of non-conjugate hierarchical models that promote sparsity in linear inverse problems. These models, parameterized by a small set of shape parameters, encompass most classical sparsity-promoting priors. As parameters change, the posterior transitions from a tractable unimodal to an intractable multimodal distribution. To track these changes, we use Stein Variational Gradient Descent augmented with Birth-Death sampling, allowing efficient mass exchange between modes while optimizing kernel bandwidth. Our approach effectively samples multimodal posteriors and provides robust sensitivity analysis, as demonstrated in experimental results.
Keywords
Bayesian Hierarchical model
Variational Inference
Distribution Evolution
We propose a Bayesian method to cluster large datasets where obtaining samples from the full posterior distribution is impractical. In Bayesian inference, an estimator is chosen by introducing a loss function and reporting the Bayes rule that minimizes its posterior expectation. Except in trivially small cases, this expectation must be approximated, typically using posterior samples. However, standard algorithms scale poorly, making it difficult to fit models with tens of thousands of items. We address the "big data" setting, where posterior sampling is infeasible, by splitting the data into overlapping subsets of manageable size for existing MCMC algorithms. The model is fit to each subset independently, generating several sets of posterior samples. Our goal is to use these samples to estimate a partition that approximates the one minimizing the full model's posterior expectation. The subset size, number of subsets, and degree of overlap are key tuning parameters, which we explore.
Keywords
Bayesian clustering
Decision theory
Variation of information loss
Binder loss
Big data
Functional data analysis (FDA) has found extensive application across various fields, driven by the increasing recording of data continuously over a time interval or at several discrete points. FDA provides the statistical tools specifically designed for handling such data. Over the past decade, Variational Bayes (VB) algorithms have gained popularity in FDA, primarily due to their speed advantages over MCMC methods. This work proposes a VB algorithm for basis function selection for functional data representation while allowing for a complex error covariance structure. We assess and compare the effectiveness of our proposed VB algorithm with MCMC via simulations. We also apply our approach to a publicly available dataset. Our results show the accuracy in coefficient estimation and the efficacy of our VB algorithm to find the true set of basis functions. Notably, our proposed VB algorithm demonstrates a performance comparable to MCMC but with substantially reduced computational cost.
Keywords
Bayesian inference
Functional data
Variational EM
Basis function selection
Correlated errors