Bayesian Mixture Models for Histograms: with Applications to Large Datasets
Thursday, Aug 8: 8:50 AM - 9:05 AM
3661
Contributed Papers
Oregon Convention Center
It is not uncommon for privacy or summarization purposes to receive data in a table or in histogram format with bins and associated frequencies. In this work we present a method that fits a mixture distribution to model the probability density function of the underlying population. We focus on a mixture of normal distributions, however the method could be generalized to mixtures of other distributions. A prior is placed on the number of mixture components which could be finite or countably infinite and inference is obtained using reversible jump MCMC. We demonstrate attractive properties of the method, which show a great deal of promise to modeling large data problems using a Bayesian nonparametric approach. Additionally, we consider the case of multiple histograms and cluster them using the Dirichlet process. This clustering allows for the sharing of information between populations and provides a posterior probability of homogeneity between populations.
Dirichlet Process
Data Privacy
Big Data
Main Sponsor
Section on Bayesian Statistical Science
You have unsaved changes.