Sunday, Aug 3: 2:00 PM - 3:50 PM
4003
Contributed Papers
Music City Center
Room: CC-106C
Main Sponsor
Section on Bayesian Statistical Science
Presentations
Gaussian process (GP) modeling is widely used in computational science and engineering. However, fitting a GP to high-dimensional inputs remains challenging due to the curse of dimensionality. While various methods have been proposed to reduce input dimensionality, they typically follow a two-stage approach, performing dimension reduction and GP fitting separately. We introduce a fully Bayesian framework that seamlessly integrates dimensionality reduction with GP modeling and inference. Our approach, built on a hierarchical Bayesian model with priors on the Stiefel manifold, enforces orthonormality on the projection matrix and enables posterior inference via Hamiltonian Monte Carlo with geodesic flow. Additionally, we extend this framework by incorporating Deep Gaussian Processes (DGP) with built-in dimension reduction, providing a more flexible and powerful tool for complex datasets. Through extensive numerical studies, we demonstrate that while the proposed Bayesian method incurs higher computational costs, it improves predictive performance and uncertainty quantification, providing a principled and robust alternative to existing methods.
Keywords
Bayesian Inference
Dimension Reduction
Gaussian Processes
Hamiltonian Monte Carlo
Stiefel Manifold
Uncertainty Quantification
Bayesian Kernel Machine Regression (BKMR) models complex nonlinear relationships. Conventional fixed posterior inclusion probability (PIP) thresholds (e.g., 0.5) are often used for variable selection, which can result in inconsistent test size control, influenced by the coefficient of variation (CV) and sample size. This study proposes a dynamic PIP threshold that adjusts for CV and sample size to enhance sensitivity and reliability. A logistic regression model predicted the 95th percentile of PIP (PIP(q95)) using a four-parameter Richard Curve, incorporating log-transformed CV and sample size. Simulations across 41 CV values and 6 sample sizes compared fixed and dynamic thresholds. Validation used NHANES (2011–2014) data on urinary metals and cognitive scores. The dynamic threshold maintained nominal test sizes (~5%) across all scenarios, outperforming fixed thresholds. Applied to NHANES data, cadmium was most influential, while cobalt was preserved due to the dynamic threshold. Nonlinear relationships and cumulative risk analyses confirmed significant cognitive decline at higher exposure quantiles. The dynamic threshold enhances BKMR's reliability and precision.
Keywords
Bayesian Kernel Machine Regression (BKMR)
Posterior Inclusion Probability (PIP)
Environmental Health Data
Adaptive Thresholding
Mixture Selection
Linear mixed-effects models are fundamental in statistical methodology for analyzing repeated-measures data and longitudinal data. While regularized maximum likelihood or maximum a posteriori estimation methods are commonly employed, the literature on sampling-based Bayesian inference remains relatively unexplored. The main reason is the computational bottleneck pertaining to the covariance matrix of the random effects in high-dimensional settings. We propose compressed mixed-effects (CME) models for efficient prediction and fixed effects selection in high-dimensions. These models project a subset of the parameters into a low-dimensional space using random projection matrices yielding a quasi-likelihood. This allows us to bypass the prior specification on the high-dimensional covariance matrix by compressing its Cholesky factor using random projections, and devise a computationally efficient collapsed Gibbs sampler using shrinkage priors, enabling posterior uncertainty quantification. The CME models showcase better predictive accuracy, coverage, and selection guarantees than its competitors in diverse simulation settings and repeated measures data analysis.
Keywords
Gibbs sampling
Parameter explansion
Quasi-likelihood
Random projection
Uncertainty quantification
Bayesian Non-Negative Matrix Factorization (NMF) is a method of interest across fields including genomics, neuroscience, and audio and image processing. Bayesian Poisson NMF is of particular importance for count data, such as in cancer mutational signatures analysis. However, MCMC methods for Bayesian Poisson NMF require a computationally intensive Poisson augmentation. Further, identifying the latent rank is necessary, but commonly used heuristic approaches are slow and potentially subjective, and methods that learn rank automatically fail to provide posterior uncertainties. We introduce bayesNMF, a computationally efficient Gibbs sampler for Bayesian Poisson NMF. The desired Poisson-likelihood NMF is paired with a Normal-likelihood NMF for high-overlap proposal distributions in approximate Metropolis updates, avoiding augmentation. We additionally define Bayesian factor inclusion (BFI) and sparse Bayesian factor inclusion (SBFI) to identify rank automatically while providing posterior uncertainty. We provide an open-source R software package on GitHub. Our applications focus on mutational signatures, but our software and results can be extended to any use of Bayesian Poisson NMF.
Keywords
Non-negative matrix factorization
Efficient Bayesian computation
Gibbs sampling
Mutational signatures
I apply a propensity & oligomer size. SEC-MALS is a useful & inexpensive technique for studying molecular systems but the data has complex error structure that poses varied challenges. This includes the need to integrate data from different measurement types, to model elution curves nonparametrically for fractionated samples & to account for correlated error structure of weakly known concentration values in multiple parts of the measurement problem. Bayesian answers are ideal here due to strong prior knowledge of many of the unknown physical quantities, the directness of handling complex error structure, and the need for careful uncertainty quantification of quantities whose experimental precision is likely to be limited. I provide a Bayesian model for this setting, discussing specification, posterior sampling, and adequacy checks. I demonstrate this model by application to data on human gamma-S crystallin, a structural protein of the eye lens whose aggregation leads to cataract.
Keywords
Bayesian hierarchical model
Flow-mode data
Gamma-s crystallin protein
Complex error structure
Measurement error
Dynamic latent space models are widely used for characterizing changes in networks and
relational data over time. These models assign to each node latent attributes that characterize
connectivity with other nodes, with these latent attributes dynamically changing over time. Node 25
attributes can be organized as a three-way tensor with modes corresponding to nodes, latent
space dimension, and time. Unfortunately, as the number of nodes and time points increases, the
number of elements of this tensor becomes enormous, leading to computational and statistical
challenges, particularly when data are sparse. We propose a new approach for massively reducing
dimensionality by expressing the latent node attribute tensor as low rank. This leads to an 30
interesting new nested exemplar latent space model, which characterizes the node attribute
tensor as dependent on low-dimensional exemplar traits for each node, weights for each latent
space dimension, and exemplar curves characterizing time variation. We study properties of
this framework, including expressivity, and develop efficient Bayesian inference algorithms. The
approach leads to substantial advantages in simulations and
Keywords
latent factor model
dynamic network
Bayesian nonparametrics
ecology
tensor factorization
Beta regression is used routinely for continuous proportional data, but it often encounters practical issues such as a lack of robustness of regression parameter estimates to misspecification of the beta distribution. We develop an improved class of generalized linear models starting with the continuous binomial (cobin) distribution and further extending to dispersion mixtures of cobin distributions (micobin). The proposed cobin regression and micobin regression models have attractive robustness, computation, and flexibility properties. A key innovation is the Kolmogorov-Gamma data augmentation scheme, which facilitates Gibbs sampling for Bayesian computation, including in hierarchical cases involving nested, longitudinal, or spatial data. We demonstrate robustness, ability to handle responses exactly at the boundary (0 or 1), and computational efficiency relative to beta regression in simulation experiments and through analysis of the benthic macroinvertebrate multimetric index of US lakes using lake watershed covariates.
Keywords
Bayesian
Bounded response data
Canonical link
Data augmentation
Exponential family
Generalized linear model