On the Asymptotics of Exchangeable Sequences of Clusters

Keith Levin Co-Author
University of Wisconsin
 
Nathan Aviles First Author
University of Wisconsin-Madison
 
Nathan Aviles Presenting Author
University of Wisconsin-Madison
 
Sunday, Aug 3: 3:20 PM - 3:35 PM
1748 
Contributed Papers 
Music City Center 
Most commonly-deployed models for clustering rely on a prior distribution on the cluster sizes. Many such models in wide use, including the Dirichlet and Pitman-Yor processes, provably exhibit macro-clustering: the largest cluster size grows linearly in the sample size. This property is ill-suited to applications in, for example, social network analysis and genomics, where one might expect cluster sizes to grow slowly or even remain bounded as sample sizes grow. The Exchangeable Sequences of Clusters (ESC) model, developed to account for this exhibits micro-clustering: the size of the largest cluster grows sublinearly in the sample size. Under the ESC model, cluster sizes are independently identically distributed according to a distribution on the positive integers, conditional on their forming an integer partition of the sample size. In contrast to commonly used clustering priors, little is known about the behavior of the ESC model. In this paper, we work to close this gap by establishing the asymptotic growth rates and asymptotic distributions for both the number of clusters and the size of the largest cluster under the ESC model.

Keywords

clustering

random partitions

micorclustering

Renewal Theory

Regular Variations

Bell polynomials 

Main Sponsor

IMS