New Directions in Bayesian Nonparametrics

Changwoo Lee Chair
Duke University
 
Filippo Ascolani Organizer
Duke University
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
0343 
Invited Paper Session 
Music City Center 
Room: CC-208B 

Keywords

Bayesian nonparametrics 

Applied

No

Main Sponsor

Section on Bayesian Statistical Science

Co Sponsors

IMS
International Society for Bayesian Analysis (ISBA)

Presentations

Double trouble: Predicting new variant counts across two heterogeneous populations

Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We show that the Bayesian nonparametric (BNP) framework of a state-of-the-art single-population predictor facilitates a natural extension to multiple populations. However, we prove that a particularly natural choice of prior within this framework fails for fundamental reasons. By supplying an alternative BNP prior choice, we provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data. 

Keywords

Bayesian nonparametrics

genomics data

number of variants

beta process

Poisson point process 

Speaker

Tamara Broderick, MIT

Addressing Heterogeneity in High-Dimensional Regression through Bayesian Structured Sparse Clustering

In many high-dimensional regression settings, it is appealing to impose low-dimensional structures on the coefficients. Additionally, clustering the coefficients helps uncover latent groups that reflect heterogeneity in the relationship between covariates and outcomes.
Clustering such high-dimensional data with low-dimensional constraints poses computational challenges, especially when using optimization methods due to the nonconvex nature of the mixture problem. While Bayesian methods offer a natural framework for sampling from the mixture model and quantifying uncertainty, specifying the prior remains difficult: spike-and-slab priors introduce computational complexity in sampling, whereas continuous shrinkage priors are ineffective at inducing the exact sparsity within mixture models. To address these challenges, we propose an optimization-driven structural sparse prior within a nonparametric Bayesian clustering approach. The hierarchical prior structure enables an efficient and straightforward Gibbs sampler. From a theoretical standpoint, we establish consistency results, both in terms of optimal parameter recovery rates and clustering accuracy. We illustrate the effectiveness of the proposed method through a compositional regression task, applying it to the analysis of GDP contributions from multiple industries across 51 states. 

Keywords

Bayesian Nonparametrics

Dimension Reduction

Compositional Regression 

Speaker

Maoran Xu

Multivariate species sampling models

Species sampling processes have long provided a fundamental framework for random discrete distributions and exchangeable sequences. However, analyzing data from distinct, yet related, sources, requires a broader notion of probabilistic invariance, with partial exchangeability as the natural choice. Over the past two decades, numerous models for partially exchangeable data, known as dependent nonparametric priors, have emerged, including hierarchical, nested, and additive processes. Despite their widespread use in Statistics and Machine Learning, a unifying framework remains elusive, leaving key questions about their learning mechanisms unanswered.
We fill this gap by introducing multivariate species sampling models, a general class of nonparametric priors encompassing most existing dependent nonparametric processes. These models are defined by a partially exchangeable partition probability function, encoding the induced multivariate clustering structure. We establish their core distributional properties and dependence structure, showing that borrowing of information across groups is entirely determined by shared ties. This provides new insights into their learning mechanisms, including a principled explanation for the correlation structure observed in existing models.
Beyond offering a cohesive theoretical foundation, our approach serves as a constructive tool for developing new models and opens new research directions aimed at capturing even richer dependence structures. 

Keywords

Bayesian Nonparametrics

Dependent nonparametric prior

Dirichlet process

Partial exchangeability

Pitman-Yor process

Random partition 

Speaker

Igor Pruenster, Bocconi University

Nonparametric Empirical Bayes and Selective Inference

Consider a multi-population inference problem where it is of interest to estimate the mean of the population with the highest observed sample average. The usual confidence interval does not work in this case -- offering increasingly lower coverage than the nominal value when the total number of populations gets larger. This phenomenon is often referred to as the Winner's Curse. Various modifications have been proposed to adjust for the selection step. We show that interval procedures that guarantee nominal coverage conditional on the selection event typically have infinite expected length. This result motivates us to consider empirical Bayesian solutions which offer coverage guarantees only on average over some parameter subspace. Nonparametric empirical Bayesian solutions are shown to generally offer good coverage with high precision but can perform poorly when one population is very different from all others -- a clear violation of the underlying exchangeability assumption. We conclude with further mitigation strategies and discuss their frequentist and Bayesian interpretations.  

Keywords

Selective inference

Winner's curse

Infinite length confidence intervals

Hierarchical Bayes

Empirical Bayes

Predictive recursion 

Speaker

Surya Tokdar, Duke University