Innovative Methodologies for Spatiotemporal Modeling and Inference

Ning Ning Chair
 
Ning Ning Organizer
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0432 
Invited Paper Session 
Music City Center 
Room: CC-202C 

Applied

Yes

Main Sponsor

General Methodology

Co Sponsors

International Association for Statistical Computing
Section on Statistical Computing

Presentations

100 square miles: The Integration of Information for Understanding Risk

As statisticians, we have expert knowledge of linking measured physical and population knowledge to address challenges to advance humankind. For example, environmental risk assessment examines risks to human health from environmental externalities by linking population health and demographic information with measured environmental information from multiple modalities. Different risk assessments, for example insurance risk, benefit from similar methodologies. This talk will address linking information in a hyper-local setting, which we call 100 square miles. Issues addressed include changes in temporal and geographic support, differential quality of information, complex associations, and differential sample sizes. The statistical approach explored is linked hierarchical models focusing on capturing the observational series' inherent sampling and measurement error. To the extent possible, we will strive toward understanding causal risk relationships, or at least understanding why causality is not well articulated.  

Speaker

Katherine Ensor, Rice University

Accelerated Inference for Partially Observed Markov Processes using Automatic Differentiation

Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model parameters. We show how to embed two existing AD particle filter methods in a theoretical framework that provides an extension to a new class of algorithms. This new class permits a bias/variance tradeoff and hence a mean squared error substantially lower than the existing algorithms. We develop likelihood maximization algorithms suited to the Monte Carlo properties of the AD gradient estimate. Our algorithms require only a differentiable simulator for the latent dynamic system; by contrast, most previous approaches to AD likelihood maximization for particle filters require access to the system's transition probabilities. Numerical results indicate that a hybrid algorithm that uses AD to refine a coarse solution from an iterated filtering algorithm show substantial improvement on current state-of-the-art methods for a challenging scientific benchmark problem. 

Speaker

Edward Ionides, University of Michigan

Bayesian Flexible Modeling of Spatially Resolved Transcriptomic Data

Single-cell RNA-sequencing technologies may provide valuable insights to the understanding of the composition of different cell types and their functions within a tissue. Recent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single cell level along with the spatial locations of these cells in the tissue. Dimension-reduction and spatial clustering are two of the most common exploratory analysis strategies for spatial transcriptomic data. However, existing dimension reduction methods may lead to a loss of inherent dependency structure among genes at any spatial location in the tissue and hence do not provide insights of gene co-expression pattern. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial co-ordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a flexible Bayesian approach to simultaneously estimate the row and column covariances for the matrix-variate spatial transcriptomic data. The posterior estimates of the row and column covariances provide data summaries for downstream exploratory analysis. We illustrate our method with simulations and two analyses of real data generated from a recent spatial transcriptomic platform. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. 

Keywords

Spatial transcriptomic Data

Bayesian Modeling

Spatial clustering

Gene Networks

MCMC computation 

Co-Author(s)

Yang Ni, Texas A&M University
Arhit Chakrabarti, Texas A&M
Valeriya Rogovchenko, Texas A&M University

Speaker

Bani Mallick, Texas A&M University

Mixing time of the conditional backward sampling particle filter

The conditional backward sampling particle filter (CBPF) is a powerful Markov chain Monte Carlo sampler for general state space hidden Markov model smoothing. It was proposed as an improvement over the conditional particle filter (CPF), which is known to have an O(T^2) computational time complexity under a general 'strong' mixing assumption, where T is the time horizon. While there is empirical evidence of the superiority of the CBPF over the CPF in practice, this has never been theoretically quantified. We show that the CBPF has O(TlogT) time complexity under strong mixing. In particular, the CBPF's mixing time is upper bounded by O(logT), for any sufficiently large number of particles N that depends only on the mixing assumptions and not T. We also show that an O(logT) mixing time is optimal. The proof involves the analysis of a novel coupling of two CBPFs, which employs a maximal coupling of two particle systems at each time instant. The coupling is implementable, and thus can also be used to construct unbiased, finite variance, estimates of functionals which have arbitrary dependence on the latent state's path, with a total expected cost of O(TlogT). We also investigate other couplings, and we show some of these alternatives can have improved empirical behaviour. 

Keywords

Conditional Particle Filter

Gibbs Sampling

State-space model

Mixing time

Unbiased estimation

Smoothing 

Speaker

sumeetpal singh, University of Wollongong

Stream-level flow matching from a Bayesian decision theoretic perspective

Flow matching is a family of training algorithms for fitting continuous normalizing flows (CNFs). A standard approach to FM, called conditional flow matching (CFM), exploits the fact that the marginal vector field of a CNF can be trained by least-square regression on the conditional vector field given one or both ends of the flow path. We show that viewing CFM training from a Bayesian decision theoretic perspective on parameter estimation opens the door to generalizations of CFM. We present one such extension by defining conditional probability paths given what we call "streams", or instances of latent stochastic paths that connect pairs of noise and observed data. We advocate the modeling of these latent streams using Gaussian processes (GPs), whose distributional properties allow sampling from the resulting conditional probability paths without simulating the streams. This GP-based stream-level CFM can substantially reduce the variance in the estimated marginal vector field at a moderate computational cost, thereby improving the generated samples under common metrics. It also allows for flexibly linking multiple related training data points and incorporating prior information.  

Keywords

generative models

normalizing flows

nonparametric methods

latent variable models

hierarchical models 

Speaker

Li Ma, Duke University

Variational Bayesian inference of dynamical system models

Parameter estimation for nonlinear dynamical systems, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We recently introduced a method, MAGI (MAnifold-constrained Gaussian process Inference), which uses a Gaussian process explicitly conditioned on the manifold constraint that the derivative of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. When the dimension of the underlying ODE system becomes high, the Hamiltonian Monte Carlo employed by MAGI slows down. In this talk we will show how Stein variational gradient descent, a variation Bayes method, can significantly speed up the computation. MAGI with variational Bayes is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of the method using realistic examples based on physical experiments, including inference with unobserved system components, which often occur in real experiments. 

Keywords

ordinary differential equations

Gaussian process

manifold constraint

gradient descent

variational Bayes

missing component 

Speaker

Samuel Kou, Harvard University