On the Asymptotics of Exchangeable Sequences of Clusters
Sunday, Aug 3: 3:20 PM - 3:35 PM
1748
Contributed Papers
Music City Center
Most commonly-deployed models for clustering rely on a prior distribution on the cluster sizes. Many such models in wide use, including the Dirichlet and Pitman-Yor processes, provably exhibit macro-clustering: the largest cluster size grows linearly in the sample size. This property is ill-suited to applications in, for example, social network analysis and genomics, where one might expect cluster sizes to grow slowly or even remain bounded as sample sizes grow. The Exchangeable Sequences of Clusters (ESC) model, developed to account for this exhibits micro-clustering: the size of the largest cluster grows sublinearly in the sample size. Under the ESC model, cluster sizes are independently identically distributed according to a distribution on the positive integers, conditional on their forming an integer partition of the sample size. In contrast to commonly used clustering priors, little is known about the behavior of the ESC model. In this paper, we work to close this gap by establishing the asymptotic growth rates and asymptotic distributions for both the number of clusters and the size of the largest cluster under the ESC model.
clustering
random partitions
micorclustering
Renewal Theory
Regular Variations
Bell polynomials
Main Sponsor
IMS
You have unsaved changes.