The strange and fascinating world of density mixtures

Andrew Yarger Chair
Purdue University
 
Andrew Yarger Organizer
Purdue University
 
Michael Levine Organizer
Purdue University
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
0745 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-101B 

Applied

No

Main Sponsor

IMS

Co Sponsors

Section on Nonparametric Statistics
Section on Statistical Learning and Data Science

Presentations

Inference with Contaminated Data: Harnessing the potentials of nonparametric finite mixtures

In some studies, participants are first classified as either having or not having the characteristic of interest based on diagnostic tools, but such classifiers may not be perfectly accurate. Diagnostic misclassification has been shown to introduce severe bias in estimating treatment effects and lead to grossly inaccurate inferences. We aim to address these problems in a fully nonparametric setting. Methods for estimating and testing meaningful yet nonparametric treatment effects are developed. The proposed methods apply to outcomes measured on ordinal, discrete, or continuous scales. They do not require any assumptions, such as the existence of moments. The applications of the proposed methods are illustrated using gene expression profiling of bronchial airway brushing in asthmatic and healthy control subjects. 

Speaker

Solomon Harrar, University of Kentucky

Learning Topic Hierarchies by Tree-Directed Latent Variable Models

We study a parametric family of latent variable models, namely topic models, equipped with a hierarchical structure among the topic variables. Such models may be viewed as a finite mixture of the latent Dirichlet allocation (LDA) induced distributions, but the LDA components are constrained by a latent hierarchy, specifically a rooted and directed tree structure, which enables the learning of interpretable and latent topic hierarchies of interest. A mathematical framework is developed in order to establish identifiability of the latent topic hierarchy under suitable regularity conditions, and to derive bounds for posterior contraction rates of the model and its parameters. We demonstrate the usefulness of such models and validate its theoretical properties through a careful simulation study and a real data example using the New York Times articles. 

Co-Author

Long Nguyen, University of Michigan

Speaker

Sunrit Chakraborty, Duke University

Cluster weighted models using skewed distributions for functional data

We propose a new model-based clustering method, funWeightClustSkew, for heterogeneous functional linear regression data. This method is based on the functional high dimensional data clustering (funHDDC) method. We use multivariate functional principal component analysis, and we assume that the scores have one of three skewed distributions: the skew-t, the variance-gamma or the normal-inverse Gaussian distributions. We consider several parsimonious models, and we propose a variant of the Expectation-Maximization (EM) algorithm for parameter estimation. 

Co-Author

Roy Shivam Ram Shreshtth, Indian Institute of Technology Kanpur

Speaker

cristina anton, MacEwan University

Composite Transportation Divergence and Finite Mixture Models

When data from a statistical population is large and distributed across multiple locations,
initial estimates of the population distribution are often computed on local machines. These local estimators are then transmitted to a central machine for aggregation. For parametric models, simple aggregation via arithmetic means typically achieves optimal convergence rates. However, in finite mixture models, where the parameter space is non-Euclidean, proper aggregation demands more nuanced approaches, considering both computational and statistical challenges. To address the computational burden, we propose using the composite transportation divergence to aggregate mixture distributions. This divergence-based approach identifies an aggregated estimator that is optimal under the defined criteria. We introduce an MM algorithm guaranteed to converge to at least a local optimum
after a finite number of iterations. Our method is further applicable to Gaussian mixture reduction, where a high-order Gaussian mixture is approximated by one of lower order. Under slightly stronger assumptions, the aggregated estimator retains the optimal convergence rate and can be made tolerant to Byzantine failures.
This work is based on joint research with Qiong Zhang and Gong Archer Zhang. 

Speaker

Jiahua Chen, University of British Columbia

Nonparametric density mixtures and smoothed penalized nonparametric likelihood: the match made in heaven

One of the most important tools for exploring heterogeneous data in many application areas are finite density mixture models. Non- and semiparametric finite density mixture models are a relatively new field of research within a wider area of finite density mixture models that has a lot to offer in terms of theory, methodology, and applications. In this presentation, we discuss a general approach to designing algorithms for estimation of components of these models based on the nonparametric smoothed penalized maximum likelihood. This approach results in converging algorithms for many different semi- and nonparametric finite density mixture models, including the multivariate ones. In doing so, this approach unifies conceptually many seemingly disparate mixture models. We also illustrate the usefulness of the proposed approach by showing the large-sample consistency of the implicit estimator that results from applying this method. Several simulations and real-life applications round out our presentation. 

Speaker

Michael Levine, Purdue University