Multivariate species sampling models

Igor Pruenster Speaker
Bocconi University
 
Tuesday, Aug 5: 11:25 AM - 11:50 AM
Invited Paper Session 
Music City Center 

Description

Species sampling processes have long provided a fundamental framework for random discrete distributions and exchangeable sequences. However, analyzing data from distinct, yet related, sources, requires a broader notion of probabilistic invariance, with partial exchangeability as the natural choice. Over the past two decades, numerous models for partially exchangeable data, known as dependent nonparametric priors, have emerged, including hierarchical, nested, and additive processes. Despite their widespread use in Statistics and Machine Learning, a unifying framework remains elusive, leaving key questions about their learning mechanisms unanswered.
We fill this gap by introducing multivariate species sampling models, a general class of nonparametric priors encompassing most existing dependent nonparametric processes. These models are defined by a partially exchangeable partition probability function, encoding the induced multivariate clustering structure. We establish their core distributional properties and dependence structure, showing that borrowing of information across groups is entirely determined by shared ties. This provides new insights into their learning mechanisms, including a principled explanation for the correlation structure observed in existing models.
Beyond offering a cohesive theoretical foundation, our approach serves as a constructive tool for developing new models and opens new research directions aimed at capturing even richer dependence structures.

Keywords

Bayesian Nonparametrics

Dependent nonparametric prior

Dirichlet process

Partial exchangeability

Pitman-Yor process

Random partition