Bayesian Mixture Models for Histograms: with Applications to Large Datasets

Richard Warr First Author
Brigham Young University
 
Richard Warr Presenting Author
Brigham Young University
 
Thursday, Aug 8: 8:50 AM - 9:05 AM
3661 
Contributed Papers 
Oregon Convention Center 
It is not uncommon for privacy or summarization purposes to receive data in a table or in histogram format with bins and associated frequencies. In this work we present a method that fits a mixture distribution to model the probability density function of the underlying population. We focus on a mixture of normal distributions, however the method could be generalized to mixtures of other distributions. A prior is placed on the number of mixture components which could be finite or countably infinite and inference is obtained using reversible jump MCMC. We demonstrate attractive properties of the method, which show a great deal of promise to modeling large data problems using a Bayesian nonparametric approach. Additionally, we consider the case of multiple histograms and cluster them using the Dirichlet process. This clustering allows for the sharing of information between populations and provides a posterior probability of homogeneity between populations.

Keywords

Dirichlet Process

Data Privacy

Big Data 

Main Sponsor

Section on Bayesian Statistical Science