Of Methods and Models built using the NIMBLE R software

Michele Peruzzi Chair
University of Michigan
 
Sally Paganin Organizer
The Ohio State University
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
0770 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-209C 

Keywords

nimble

hierarchical models

estimation methods 

Applied

Yes

Main Sponsor

Section on Statistical Computing

Co Sponsors

International Society for Bayesian Analysis (ISBA)
Section on Bayesian Statistical Science

Presentations

Bayesian Model Assessment using NIMBLE

Posterior predictive p-values (ppps) have become popular tools for Bayesian model assessment, being general-purpose and easy to use. However, interpretation can be difficult because their distribution is not uniform under the hypothesis that the model did generate the data. Calibrated ppps (cppps) can be obtained via a bootstrap-like procedure, yet remain unavailable in practice due to high computational cost. This work introduces methods to enable efficient approximation of cppps and their uncertainty for fast model assessment. We first investigate the computational tradeoff between the number of calibration replicates and the number of MCMC samples per replicate. Provided that the MCMC chain from the real data has converged, using short MCMC chains per calibration replicate can save significant computation time compared to naive implementations, without significant loss in accuracy. We propose different variance estimators for the cp approximation, which can be used to confirm the lack of evidence against model misspecification quickly. As variance estimation uses effective sample sizes of many short MCMC chains, we show these can be approximated well from the real-data MCMC chain. The procedure for cppp is implemented in NIMBLE, a flexible framework for hierarchical modeling that supports many models and discrepancy measures. 

Speaker

Sally Paganin, The Ohio State University

Bayesian Model Assessment using NIMBLE


Hidden Markov Individual-level Models of Infectious Disease Transmission

Individual-level models of infectious disease transmission are being increasingly used to help understand the transmission dynamics of various diseases. However, fitting such models to individual-level epidemic data is challenging, as we often only know when an individual was detected and not when they were infected or removed. To account for missing infection and removal times, we first assume the epidemiological states of the individuals (e.g., susceptible, infectious, or recovered) follow a series of hidden coupled first-order Markov chains. The observed detection times are then generated conditional on the states of the chains using autoregressive Bernoulli models. Bayesian coupled hidden Markov models have been used for individual-level epidemic data before. However, these approaches assumed each individual was continuously tested and that the tests were independent. Often, individuals are only tested until their first positive test, and multiple tests on the same individual might not be independent. We accommodate these scenarios by assuming the probability of detecting the disease can depend on past observations, which allows us to fit a much wider range of practical applications. Our approach only requires the initial detection time of each detected individual. Also, unlike more traditional data augmentation methods, we do not assume this detection time corresponds to infection or removal or that infected individuals must at some point be detected. We illustrate the flexibility of our approach by fitting two examples: an experiment on the spread of tomato spot wilt virus in pepper plants and an outbreak of norovirus among nurses in a hospital. All models are fit under a unified Bayesian framework using the individual forward filtering backward sampling algorithm implemented with NIMBLE's custom sampler feature. 

Speaker

Dirk Douwes-Schultz, University of Calgary

MCMC Extensibility: New MCMC samplers in NIMBLE

The nimble R package offers a Markov chain Monte Carlo (MCMC) engine, which is capable of operating on generically-specified hierarchical statistical models written using the BUGS language. Here, we focus on the extensibility of nimble's MCMC system, as we describe how new MCMC samplers can be written, and readily incorporated into the MCMC algorithm. We demonstrate how users can author their own MCMC samplers, or readily modify the preexisting sampling algorithms provided with nimble. We also present several recent additions to nimble's library of sampling algorithms, including the gradient-based Hamiltonian Monte Carlo (HMC) and Barker proposal samplers. 

Keywords

Markov chain Monte Carlo

nimble

MCMC

Bayesian Statistics 

Co-Author(s)

Perry De Valpine, UC Berkeley, Environmental Science, Policy & Management
Christopher Paciorek, University of California, Berkeley

Speaker

Daniel Turek, Lafayette College

One size does not fit all: Customizing MCMC methods for hierarchical models using NIMBLE

Improved efficiency of Markov chain Monte Carlo facilitates all aspects of statistical analysis with Bayesian hierarchical models. Identifying strategies to improve MCMC performance is becoming increasingly crucial as the complexity of models, and the run times to fit them, increases. We evaluate different strategies for improving MCMC efficiency using the open-source software NIMBLE (R package nimble) using common ecological models of species occurrence and abundance as examples. We ask how MCMC efficiency depends on model formulation, model size, data, and sampling strategy. For multiseason and/or multispecies occupancy models and for N-mixture models, we compare the efficiency of sampling discrete latent states vs. integrating over them, including more vs. fewer hierarchical model components, and univariate vs. block-sampling methods. We include the common MCMC tool JAGS in comparisons. For simple models, there is little practical difference between computational approaches. As model complexity increases, there are strong interactions between model formulation and sampling strategy on MCMC efficiency. There is no one-size-fits-all best strategy, but rather problem-specific best strategies related to model structure and type. In all but the simplest cases, NIMBLE's default or customized performance achieves much higher efficiency than JAGS. In the two most complex examples, NIMBLE was 10–12 times more efficient than JAGS. We find NIMBLE is a valuable tool for many ecologists utilizing Bayesian inference, particularly for complex models where JAGS is prohibitively slow. Our results highlight the need for more guidelines and customizable approaches to fit hierarchical models to ensure practitioners can make the most of occupancy and other hierarchical models. By implementing model-generic MCMC procedures in open-source software, including the NIMBLE extensions for integrating over latent states (implemented in the R package nimbleEcology), we have made progress toward this aim. 

Co-Author(s)

Perry De Valpine, UC Berkeley, Environmental Science, Policy & Management
Nicholas Michaud, University of California, Berkeley
Daniel Turek, Lafayette College

Speaker

Lauren Ponisio, University of Oregon

PresentationUU

Speaker

Wei Zhang, University of Glasgow