Statistical Modeling with Tree Graphs

Leo Duan Chair
University of Florida
 
Leo Duan Organizer
University of Florida
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
0172 
Invited Paper Session 
Music City Center 
Room: CC-208A 

Applied

Yes

Main Sponsor

Section on Bayesian Statistical Science

Co Sponsors

IMS
International Society for Bayesian Analysis (ISBA)

Presentations

Spatial clustering consistency of random spanning tree partition models under infill-domain asymptotics

In this research, we propose a novel blocking Bayesian spatially random spanning tree model for modeling latent spatially piecewise constant functions. We divide the spatial domain into several disjoint blocks and construct a random spanning tree model to merge blocks into spatially contiguous clusters. Under the spatial varying coefficient model setting, we provide conditions on the asymptotic rates of the number of blocks and the prior number of clusters, to obtain spatial clustering consistency results under an infill domain asymptotics. Those conditions serve as guidelines for choosing hyperparameters in our model. Based on the clustering consistency results, we also show the Bayesian posterior convergence rates of latent spatially varying coefficients and the regression mean functions.  

Keywords

Bayesian Posterior Concentration Theory, Infill domain asymptotics, Random Spanning Trees, Spatial Clustering 

Co-Author(s)

Huiyan Sang
Kun Huang, Center for Statistical Science of Tsinghua University

Speaker

Huiyan Sang

Quantitative traits on massive trees with many incomplete measurements

Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. An additional challenge arises as obtaining a full suite of measurements becomes increasingly difficult with increasing taxa. This generally necessitates data imputation or integration, and existing control techniques typically scale poorly as the number of taxa increases. We propose an inference technique that integrates out missing measurements analytically and scales linearly with the number of taxa by using a post-order traversal algorithm under a multivariate Brownian diffusion (MBD) model to characterize trait evolution. We further exploit this technique to extend the MBD model to account for sampling error or non-heritable residual variance. We test these methods to examine mammalian life history traits, prokaryotic genomic and phenotypic traits, and HIV infection traits. We find computational efficiency increases that top two orders-of-magnitude over current best practices. While we focus on the utility of this algorithm in phylogenetic comparative methods, our approach generalizes to solve long-standing challenges in computing the likelihood for matrix-normal and multivariate normal distributions with missing data at scale. 

Keywords

Bayesian computation

Phylogenetics and phylodynamics

Stochastic processes 

Co-Author(s)

Xiang Ji, Tulane University
Marc Suchard, University of California-Los Angeles

Speaker

Xiang Ji, Tulane University

Bag of DAGs: Inferring Directional Dependence in Spatiotemporal Processes

We propose a class of nonstationary processes to characterize space- and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the pollutant spread, but information regarding prevailing wind directions may be missing or unreliable. We propose to map a discrete set of wind directions to edges in a sparse directed acyclic graph (DAG), accounting for uncertainty in directional correlation patterns across a domain. The resulting Bag of DAGs processes (BAGs) lead to interpretable nonstationarity and scalability for large data due to sparsity of DAGs in the bag. We outline Bayesian hierarchical models using BAGs and illustrate inferential and performance gains of our methods compared to other state-of-the-art alternatives. We analyze fine particulate matter using high-resolution data from low-cost air quality sensors in California during the 2020 wildfire season. An R package is available on GitHub. 

Keywords

Spatial multivariate modeling 

Co-Author(s)

Michele Peruzzi, University of Michigan
David Dunson

Speaker

Bora Jin, Duke University