Sparse Bayesian Clustering for Bounded Data via a Multivariate Beta Mixture Model
Sunday, Aug 2: 2:00 PM - 3:50 PM
3361
Contributed Speed
We develop a Bayesian overfitted multivariate beta mixture model for clustering aggregated ecological data bounded between 0 and 1. Such data, common in social determinants of health (SDoH) research, pose challenges for standard clustering methods due to restrictive distributional assumptions and limited interpretability. The proposed model reparameterizes the multivariate beta distribution in terms of mean and concentration parameters, enabling direct interpretation of cluster-specific profiles while accommodating skewness inherent in the data. Integrated feature saliency operates on cluster means to induce sparsity by identifying variables that meaningfully drive clustering and shrinking uninformative features toward a shared mean. An overfitted mixture formulation supports data-driven inference on the number of clusters while preserving posterior uncertainty. We assess performance through simulation studies and apply the model to neighborhood-level SDoH data from the Agency for Healthcare Research and Quality, yielding interpretable ecological clusters. The framework generalizes to a broad class of bounded, aggregated multivariate data.
Bayesian mixture model
multivariate beta distribution
sparse modeling
ecological data
feature saliency
Main Sponsor
Section on Bayesian Statistical Science
You have unsaved changes.