Bayesian Hierarchical and Latent Variable Models: New Directions

Mahsa Ashouri Chair
Miami University
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
4003 
Contributed Papers 
Music City Center 
Room: CC-106C 

Main Sponsor

Section on Bayesian Statistical Science

Presentations

A Fully Bayesian Framework for Built-in Input Dimension Reduction and Gaussian Process Modeling

Gaussian process (GP) modeling is widely used in computational science and engineering. However, fitting a GP to high-dimensional inputs remains challenging due to the curse of dimensionality. While various methods have been proposed to reduce input dimensionality, they typically follow a two-stage approach, performing dimension reduction and GP fitting separately. We introduce a fully Bayesian framework that seamlessly integrates dimensionality reduction with GP modeling and inference. Our approach, built on a hierarchical Bayesian model with priors on the Stiefel manifold, enforces orthonormality on the projection matrix and enables posterior inference via Hamiltonian Monte Carlo with geodesic flow. Additionally, we extend this framework by incorporating Deep Gaussian Processes (DGP) with built-in dimension reduction, providing a more flexible and powerful tool for complex datasets. Through extensive numerical studies, we demonstrate that while the proposed Bayesian method incurs higher computational costs, it improves predictive performance and uncertainty quantification, providing a principled and robust alternative to existing methods. 

Keywords

Bayesian Inference

Dimension Reduction

Gaussian Processes

Hamiltonian Monte Carlo

Stiefel Manifold

Uncertainty Quantification 

Co-Author(s)

Emily Kang, University of Cincinnati
Bledar Konomi, University of Cincinnati

First Author

Eric Herrison Gyamfi, University of Cincinnati

Presenting Author

Eric Herrison Gyamfi, University of Cincinnati

Adaptive Thresholding in Bayesian Kernel Machine Regression: Improving Sensitivity and Reliability

Bayesian Kernel Machine Regression (BKMR) models complex nonlinear relationships. Conventional fixed posterior inclusion probability (PIP) thresholds (e.g., 0.5) are often used for variable selection, which can result in inconsistent test size control, influenced by the coefficient of variation (CV) and sample size. This study proposes a dynamic PIP threshold that adjusts for CV and sample size to enhance sensitivity and reliability. A logistic regression model predicted the 95th percentile of PIP (PIP(q95)) using a four-parameter Richard Curve, incorporating log-transformed CV and sample size. Simulations across 41 CV values and 6 sample sizes compared fixed and dynamic thresholds. Validation used NHANES (2011–2014) data on urinary metals and cognitive scores. The dynamic threshold maintained nominal test sizes (~5%) across all scenarios, outperforming fixed thresholds. Applied to NHANES data, cadmium was most influential, while cobalt was preserved due to the dynamic threshold. Nonlinear relationships and cumulative risk analyses confirmed significant cognitive decline at higher exposure quantiles. The dynamic threshold enhances BKMR's reliability and precision. 

Keywords

Bayesian Kernel Machine Regression (BKMR)

Posterior Inclusion Probability (PIP)

Environmental Health Data

Adaptive Thresholding

Mixture Selection 

Co-Author(s)

Gabriel Odom, Florida International University
Zoran Bursac, Florida International University
Boubakari Ibrahimou, Florida International University

First Author

Kazi Tanvir Hasan, Florida International University

Presenting Author

Kazi Tanvir Hasan, Florida International University

Bayesian Compressed Mixed-Effects Models

Linear mixed-effects models are fundamental in statistical methodology for analyzing repeated-measures data and longitudinal data. While regularized maximum likelihood or maximum a posteriori estimation methods are commonly employed, the literature on sampling-based Bayesian inference remains relatively unexplored. The main reason is the computational bottleneck pertaining to the covariance matrix of the random effects in high-dimensional settings. We propose compressed mixed-effects (CME) models for efficient prediction and fixed effects selection in high-dimensions. These models project a subset of the parameters into a low-dimensional space using random projection matrices yielding a quasi-likelihood. This allows us to bypass the prior specification on the high-dimensional covariance matrix by compressing its Cholesky factor using random projections, and devise a computationally efficient collapsed Gibbs sampler using shrinkage priors, enabling posterior uncertainty quantification. The CME models showcase better predictive accuracy, coverage, and selection guarantees than its competitors in diverse simulation settings and repeated measures data analysis. 

Keywords

Gibbs sampling

Parameter explansion

Quasi-likelihood

Random projection

Uncertainty quantification 

Co-Author(s)

Kshitij Khare, University of Florida
Sanvesh Srivastava, University of Iowa

First Author

Sreya Sarkar

Presenting Author

Sreya Sarkar

BayesNMF: Fast Bayesian Poisson NMF with Automatically Learned Rank Applied to Mutational Signatures

Bayesian Non-Negative Matrix Factorization (NMF) is a method of interest across fields including genomics, neuroscience, and audio and image processing. Bayesian Poisson NMF is of particular importance for count data, such as in cancer mutational signatures analysis. However, MCMC methods for Bayesian Poisson NMF require a computationally intensive Poisson augmentation. Further, identifying the latent rank is necessary, but commonly used heuristic approaches are slow and potentially subjective, and methods that learn rank automatically fail to provide posterior uncertainties. We introduce bayesNMF, a computationally efficient Gibbs sampler for Bayesian Poisson NMF. The desired Poisson-likelihood NMF is paired with a Normal-likelihood NMF for high-overlap proposal distributions in approximate Metropolis updates, avoiding augmentation. We additionally define Bayesian factor inclusion (BFI) and sparse Bayesian factor inclusion (SBFI) to identify rank automatically while providing posterior uncertainty. We provide an open-source R software package on GitHub. Our applications focus on mutational signatures, but our software and results can be extended to any use of Bayesian Poisson NMF. 

Keywords

Non-negative matrix factorization

Efficient Bayesian computation

Gibbs sampling

Mutational signatures 

Co-Author(s)

Nishanth Basava, McCallie School
Giovanni Parmigiani, Dana-Farber Cancer Institute

First Author

Jenna Landy, Harvard University

Presenting Author

Jenna Landy, Harvard University

Hierarchical Bayesian Modelling of Piecewise Continuous Functions using Experimental Data with Error

I apply a propensity & oligomer size. SEC-MALS is a useful & inexpensive technique for studying molecular systems but the data has complex error structure that poses varied challenges. This includes the need to integrate data from different measurement types, to model elution curves nonparametrically for fractionated samples & to account for correlated error structure of weakly known concentration values in multiple parts of the measurement problem. Bayesian answers are ideal here due to strong prior knowledge of many of the unknown physical quantities, the directness of handling complex error structure, and the need for careful uncertainty quantification of quantities whose experimental precision is likely to be limited. I provide a Bayesian model for this setting, discussing specification, posterior sampling, and adequacy checks. I demonstrate this model by application to data on human gamma-S crystallin, a structural protein of the eye lens whose aggregation leads to cataract. 

Keywords

Bayesian hierarchical model

Flow-mode data

Gamma-s crystallin protein

Complex error structure

Measurement error 

Co-Author

Carter Butts, University of California-Irvine

First Author

Frances Beresford, University of California Irvine

Presenting Author

Frances Beresford, University of California Irvine

Nested exemplar latent space models for dimension reduction in dynamic networks

Dynamic latent space models are widely used for characterizing changes in networks and
relational data over time. These models assign to each node latent attributes that characterize
connectivity with other nodes, with these latent attributes dynamically changing over time. Node 25
attributes can be organized as a three-way tensor with modes corresponding to nodes, latent
space dimension, and time. Unfortunately, as the number of nodes and time points increases, the
number of elements of this tensor becomes enormous, leading to computational and statistical
challenges, particularly when data are sparse. We propose a new approach for massively reducing
dimensionality by expressing the latent node attribute tensor as low rank. This leads to an 30
interesting new nested exemplar latent space model, which characterizes the node attribute
tensor as dependent on low-dimensional exemplar traits for each node, weights for each latent
space dimension, and exemplar curves characterizing time variation. We study properties of
this framework, including expressivity, and develop efficient Bayesian inference algorithms. The
approach leads to substantial advantages in simulations and 

Keywords

latent factor model

dynamic network

Bayesian nonparametrics

ecology

tensor factorization 

Co-Author(s)

Luca Silva, Bocconi University
Tomas Roslin, Department of Ecology, Swedish University of Agricultural Sciences
David Dunson

First Author

JENNIFER KAMPE, Duke University

Presenting Author

JENNIFER KAMPE, Duke University

Scalable and robust regression models for continuous proportional data

Beta regression is used routinely for continuous proportional data, but it often encounters practical issues such as a lack of robustness of regression parameter estimates to misspecification of the beta distribution. We develop an improved class of generalized linear models starting with the continuous binomial (cobin) distribution and further extending to dispersion mixtures of cobin distributions (micobin). The proposed cobin regression and micobin regression models have attractive robustness, computation, and flexibility properties. A key innovation is the Kolmogorov-Gamma data augmentation scheme, which facilitates Gibbs sampling for Bayesian computation, including in hierarchical cases involving nested, longitudinal, or spatial data. We demonstrate robustness, ability to handle responses exactly at the boundary (0 or 1), and computational efficiency relative to beta regression in simulation experiments and through analysis of the benthic macroinvertebrate multimetric index of US lakes using lake watershed covariates. 

Keywords

Bayesian

Bounded response data

Canonical link

Data augmentation

Exponential family

Generalized linear model 

Co-Author(s)

Otso Ovaskainen, University of Helsinki
David Dunson

First Author

Changwoo Lee, Duke University

Presenting Author

Benjamin Dahl, Duke University