Advances in Bayesian Variable Selection

Emmett Kendall Chair
 
Brian Reich Organizer
North Carolina State University
 
Sunday, Aug 4: 2:00 PM - 3:50 PM
1510 
Topic-Contributed Paper Session 
Oregon Convention Center 
Room: CC-G132 

Applied

No

Main Sponsor

International Society for Bayesian Analysis (ISBA)

Co Sponsors

International Statistical Institute
Section on Bayesian Statistical Science

Presentations

Shrinkage and Selection for High-Dimensional Bayesian Estimating Equations

Bayesian inference typically relies on specification of a likelihood as a key ingredient. Recently, likelihood-free approaches have become popular to avoid potentially intractable likelihoods. Alternatively, in the Frequentist context, estimating equations are a popular choice for inference corresponding to an assumption on a set of moments (or expectations) of the underlying distribution, rather than its exact form. Common examples are in the use of generalized estimating equations with correlated responses, or in the use of M-estimators for robust regression avoiding the distributional assumptions on the errors. In the high-dimensional case, sparsity in both parameter estimates and number of estimating equations can be accomplished via a Bayesian empirical likelihood approach via careful specification of prior distributions.
 

Speaker

Howard Bondell, University of Melbourne

Taming combinatorial explosion with new optimization-induced priors

Combinatorial structures are commonplace in statistical applications, such as those involving phylogenetic trees, gene expression networks, or flow network control. In combinatorial models, as the parameter is often a high-dimensional integer vector under heavy constraints, it has been tremendously challenging to conduct statistical inference in such a discrete space with combinatorial explosion. To tackle this issue, we propose to exploit integer linear programming as a mapping to induce useful probability distributions on combinatorial space. Taking a Bayesian approach, we assign a precursor prior (such as multivariate Gaussian) to a real-valued vector, then transform the distribution onto the vertex set of an integral polytope. We can efficiently estimate the posterior using the graph-accelerated Metropolis-adjusted Langevin algorithm. This framework leads to straightforward model specification, principled uncertainty quantification, and flexible model-based extension. I will illustrate its application in a capacity control problem for traffic network data. 

Speaker

Leo Duan, University of Florida

Consistent Effect Estimation in Generalised Sparse Partially Linear Additive Models

Accurately selecting and estimating smooth functional effects in additive models with potentially many functions is a challenging task, especially if the components are decomposed into linear and nonlinear effects. We provide a rigorous definition of the true linear and nonlinear effects of an additive component using projections and introduce a new construction of the Demmler-Reinsch basis for penalised splines. We prove that our representation allows to consistently estimate the true effects as opposed to the commonly employed mixed model representations. Equipping the reparameterised regression coefficients with normal beta prime spike and slab priors allows us to automatically determine whether a continuous covariate has a linear, a nonlinear or no effect at all. We provide new theoretical results for the prior and a compelling explanation for its superior Markov chain Monte Carlo mixing performance compared to the spike-and-slab group lasso prior. Finally, we illustrate the developed methodology along effect selection on the hazard rate of a time-to-event response in the additive Cox regression model in simulations and on leukemia survival data. 

Speaker

Nadja Klein, Karlsruhe Institute of Technology

Bayesian Dynamic Tensor Factor Model for High-dimensional Multi-group Longitudinal Neuroimaging Data

Longitudinal neuroimaging data are often collected for studying temporal changes in the brain leading, e.g., to cognitive decline with age or neurodegenerative diseases. The
analyses of such data present daunting structural complexities, dimensionality issues, and modeling and computational challenges. To overcome these hurdles, we introduce
a novel individualized longitudinal image regression model that combines several popular low-rank frameworks, namely basis function representation, latent factor models,
and tensor factor models. Specifically, through the combined use of a basis mixture representation of the stacked images followed by a Tucker tensor factorization of the associated basis coefficients, we accommodate smooth variations in both space and time, account for differences between groups, and capture subject heterogeneity within those
groups, while also obtaining a massive multifold reduction in model dimensions. 

Speaker

Arkaprava Roy, University of Florida

Model-free generalized fiducial inference for empirical risk minimizers

Model-free generalized fiducial (MFGF) inference was previously introduced to facilitate the development of safe and reliable methods for uncertainty quantification in machine learning. Ideas were proposed and developed for a model-free statistical framework for imprecise probabilistic prediction inference, and provided finite sample control of frequentist type 1 errors. It was found that approximating a belief/plausibility measure pair by an [optimal in some sense] probability measure in the credal set is a critical resolution needed for the broader adoption of imprecise probabilistic approaches to inference in statistical and machine learning communities. In this new work, we develop ideas for how to transform the MFGF predictive inference framework to provide safe and reliable uncertainty quantification for empirical risk minimizers. Important special cases include parameters of a specified likelihood function, tuning parameters in regularized regression, and uncertainty quantification for model selection. 

Speaker

Jonathan Williams, North Carolina State University