Contributed Poster Presentations: Section on Bayesian Statistical Science

Ryan Peterson Chair
University of Colorado - Anschutz Medical Campus
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
6029 
Contributed Posters 
Oregon Convention Center 
Room: CC-Hall CD 

Main Sponsor

Section on Bayesian Statistical Science

Presentations

01 A Non-Stationary Bayesian Species Distribution Model, and its Application in Marine Megafauna.

Let us start with a simple illustration of some fish that lives in shallow sea waters. An impermeable barrier for this fish would be a set of islands where there is no scenario in which fishes go over it. However, there might be sand patches with varying water coverage depending on the tide. These sand patches cannot be considered permanently impermeable barriers as fishes will be present, but will do so less often than in the normal non barrier area. This is a rather common set up, however there is no solution as we have no models for this case. We propose a Transparent barrier model that can deal with complex barrier scenarios. Moreover, it relies on a Matérn field making it as efficient as the classic stationary models in spatial statistics. The Transparent Barrier model is based on interpreting the Matérn correlation as a collection of paths through a Simultaneous Autoregressive (SAR) model, manipulating local dependencies to cut off paths crossing physical barriers and formulated as a stochastic partial differential equation (SPDE) for well-behaved discretization. Then, we include a transparency parameter to explicitly add barriers with different levels of permeability. 

Keywords

Spatial distribution model

Non stationary Gaussian random field

Barrier model

Coastline and island problem

Stochastic Partial Differential Equations (SPDE)

INLA 

Abstracts


First Author

Martina Le-Bert Heyl

Presenting Author

Martina Le-Bert Heyl

03 Bayesian Additive Regression Trees in Complex Survey

Complex surveys have garnered substantial significance across diverse domains, spanning social sciences, public health, and market research. Their pivotal role lies in furnishing representative estimations while adeptly addressing the intricacies of survey design effects. When faced with the intricate complexities arising from the unknown effects of various covariates, parametric approaches may prove insufficient in handling the nuances associated with survey design impacts. Additionally, the Gaussian error distributional assumption would be inappropriate in many applications where the response distribution is heavy-tailed or skewed. This paper introduces the Bayesian Additive Regression Trees (BART) framework-a potent and adaptable approach tailored for analysing intricate survey data, specifically with subject weights. We propose an extension of BART to model heavy-tailed and skewed error distribution while considering subject weights. Its ability to account for the survey design features, handle non-linearity, and provide uncertainty estimates makes it a valuable tool for researchers and practitioners working with complex survey data. 

Keywords

Bayesian nonparametrics

Bayesian additive regression trees

Complex survey 

Abstracts


Co-Author(s)

Debajyoti Sinha, Florida State University
Dipankar Bandyopadhyay, Virginia Commonwealth University
Antonio Linero

First Author

Abhishek Mandal

Presenting Author

Abhishek Mandal

04 Cross Validation for Log Gaussian Cox process

The Log Gaussian Cox process(LGCP) is arguably one of the most used model based strategy to analyze spatial point pattern(SPP) data. In practice, we usually have different models with increasing levels of complexity that we need to criticize, assess our assumptions and validate. This work is an attempt to provide a practical solution, under a Bayesian framework, to some of these problems using Cross Validation(CV). The challenge is that, contrary to traditional CV approach based on the expected log point-wise predictive density, in SPP analysis there is no concept of data-point to be removed, which then requires a group-wise or region-wise definition for the log predictive density. For this purpose, we propose a natural extension of the expected log predictive, better suited for LGCP, that could be termed expected log region-wise or group-wise predictive density. We also provide a very accurate, fast and deterministic approximation obtained from a single run of the model that we validate with Monte Carlo samples. We expect to make the solution available in the R-INLA software. 

Keywords

Log gaussian cox process

cross validation

INLA

model selection 

Abstracts


Co-Author

Haavard Rue, Statistics Program, CEMSE, KAUST

First Author

Djidenou Montcho, Statistics Program, CEMSE, KAUST

Presenting Author

Djidenou Montcho, Statistics Program, CEMSE, KAUST

05 Dependent Dirichlet Process Estimation of Heterogeneous Treatment effect for Confounded Treatment

In observational studies, no unmeasured confounding (the ignorability of the treatment assignment) is typically assumed to identify the causal effect. However, this assumption is untestable and often fails to hold in practice. Recent work has shown that when a resistant population is available, the conditional average treatment effect on the treated can still be identified without assuming ignorability of the treatment assignment. This estimand Resistant Population Calibration Of Variance (RPCOVA), however requires estimation of the conditional variance function unlike other estimands including inverse probability weighting, differences in the conditional expectations, and the doubly robust estimands. We propose a nonparametric Bayesian approach for inference on this estimand using a dependent Dirichlet process to model the response. We establish weak consistency of the estimator and explore its finite sample performance in simulations. 

Keywords

Causal Inference

Unmeasured Confounders

Gibbs Sampler

Non Parametric Bayesian

Conditional Average Treatment Effect on the Treated 

Abstracts


Co-Author(s)

Bikram Karmakar, University of Florida
Michael Daniels, University of Florida

First Author

Animesh Mitra, University of Florida

Presenting Author

Animesh Mitra, University of Florida

07 Hamiltonian Monte Carlo to estimate Burr III distribution

In this poster presentation, we apply Hamiltonian Monte Carlo (HMC) to estimate three parameters of Burr III distribution and compare the estimates from HMC to the results from Metropolis-Hastings algorithm. And we use HMC to analyze Arthritis Relief Times Data. 

Keywords

Hamiltonian Monte Carlo

Burr III Distribution 

Abstracts


Co-Author(s)

Olivia Stewart, Slippery Rock University
Alexander Kim, Seneca Valley Senior High School
Kyra Wotorson, Slippery Rock University

First Author

Woosuk Kim, Slippery Rock University

Presenting Author

Alexander Kim, Seneca Valley Senior High School

08 Human–machine Collaboration for Improving Semiconductor Process Development

One of the bottlenecks to building semiconductor chips is the increasing cost required to develop chemical plasma processes that form the transistors and memory storage cells. These processes are still developed manually using highly trained engineers searching for a combination of tool parameters that produces an acceptable result on the silicon wafer. Here we study Bayesian optimization algorithms to investigate how artificial intelligence might decrease the cost of developing complex semiconductor chip processes. In particular, we create a controlled virtual process game to systematically benchmark the performance of humans and computers for the design of a semiconductor fabrication process. We find that human engineers excel in the early stages of development, whereas the algorithms are far more cost-efficient near the tight tolerances of the target. Furthermore, we show that a strategy using both human designers with high expertise and algorithms in a human first–computer last strategy can reduce the cost-to-target by half compared with only human designers. 

Keywords

semiconductor fabrication process

recipe optimization

Bayesian optimization

virtual process 

Abstracts


Co-Author

Sae Na Park, Lam Research

First Author

Keren Kanarik, Lam Research

Presenting Author

Sae Na Park, Lam Research

10 Nearest Neighbor Gaussian Process Variational Inference for Large Geostatistical Datasets

With the substantial increase in the availability of geostatistical data, statisticians are now equipped to make inference on spatial covariance from large datasets, which is critical in understanding spatial dependence. Traditional methods, such as Markov Chain Monte Carlo (MCMC) sampling within a Bayesian framework, can become computationally expensive as the number of spatial locations increases. As an important alternative to MCMC, Variational Inference approximates the posterior distribution through optimization. In this paper, we propose a nearest neighbor Gaussian process variational inference (NNGPVI) method to approximate the posterior. This method introduces nearest-neighbor-based sparsity in both the prior and the approximated posterior distribution. Doubly stochastic gradient methods are developed for the implementation of the optimization process. Our simulation studies demonstrate that NNGPVI achieves comparable accuracy to MCMC methods but with reduced computational costs. An analysis of satellite temperature data illustrates the practical implementation of NNGPVI and shows the inference results are matched with those obtained from the MCMC approach. 

Keywords

Bayesian Modeling

Spatial Statistics

Variational Inference

Gaussian Process

Nearest Neighbor 

Abstracts


Co-Author

Abhirup Datta, Johns Hopkins University

First Author

Jiafang Song

Presenting Author

Jiafang Song

12 Scalable Bayesian Estimation of Gaussian Graphical Models

Gaussian graphical models (GGMs) encode the conditional independence structure between multivariate normal random variables as zero entries in the precision matrix. They are powerful tools with diverse applications in genetics, portfolio optimization and computational neuroscience. Bayesian approaches have advantages over frequentist methods because they encourage graphs' sparsity, incorporate prior information, and account for uncertainty in the graph structure. However, due to the computational burden of MCMC, scalable Bayesian estimation of GGMs remains an open problem. We propose a novel approach that uses empirical Bayes nodewise regression that allows for efficient estimation of the precision matrix and flexibility in incorporating prior information in large dimensional settings. Empirical Bayes variable selection methods considered in our study include SEMMS, Zellner's g-prior, and nonlocal priors. If necessary, a post-filling model selection step is used to discover the underlying graph. Simulation results show that our Bayesian method compares favorably with competing methods in terms of accuracy metrics and excels in computational speed. 

Keywords

Gaussian graphical model

high-dimensional statistics

network analysis

empirical Bayes

nodewise regression

sparsity 

Abstracts


Co-Author(s)

Sumanta Basu, Cornell University
Martin Wells, Cornell University

First Author

Ha Nguyen

Presenting Author

Ha Nguyen

13 Scalable M-Open Model Selection in Large Data Settings

We consider the variable selection problem for linear models in the M-open setting, where the data generating process is outside the model space. We focus on the novel problem of Model Superinduction, which refers to the tendency of model selection procedures to exponentially favor larger models as the sample size grows, resulting in overparametrized models which induce severe computational difficulties. We prove the existence of this phenomenon for popular classes of model selection priors, such as mixtures of g-priors and the family of spike and slab priors. We further show this behavior is inescapable for any KL-divergence minimizing model selection procedure, so we seek to minimize its effects for large n, while preserving posterior consistency. We propose variants of the aforementioned priors that result in a slowly diminishing rate of prior influence on the posterior, which favors simpler models while preserving consistency. We further propose a model space prior which induces stronger model complexity penalization for large sample sizes. We demonstrate the efficacy of our proposed solutions via synthetic data examples and a case study using albedo data from GOES satellites. 

Keywords

Model selection

Bayesian decision theory

M-open model comparison

Linear Models

Spike and Slab prior

g-prior 

Abstracts


Co-Author

Bruno Sanso, University of California-Santa Cruz

First Author

Jacob Fontana

Presenting Author

Jacob Fontana

14 The R2D2 selection prior for survival regression

The amount of available covariates in medical data is expanding with each passing year, making identification of the most influential factors pivotal in survival regression modeling. Bayesian analyses focusing on variable selection are a common approach towards this problem. However, most use approximations of the posterior to perform this task. In this paper, we propose placing a beta prior directly on the model coefficient of determination (Bayesian R2), which acts as a shrinkage prior on the global variance of the predictors. Through reparameterization using an auxiliary variable, we are able to update a majority of the parameters with sequential Gibbs sampling, thus reducing reliance on approximate posterior inference and simplifying computation. Performance over competing variable selection priors is then showcased through an extensive simulation study in both censored and non-censored settings. Finally, the method is applied to identifying influential built environment risk factors impacting survival time of Medicare eligible patients in California with cardiovascular ailments. 

Keywords

Survival

AFT

Global-Local

Bayesian 

Co-Author(s)

Eric Yanchenko, North Carolina State University
Ana Rappold, US EPA
Brian Reich, North Carolina State University

Presenting Author

Brandon Feng, North Carolina State University

15 Tipping Point Analysis in Network Meta-Analysis

While Network Meta-Analysis (NMA) facilitates simultaneous assessment of multiple treatments, challenges such as sparse direct comparisons among treatments persist, making accurate estimation of the correlation between multiple treatments in arm-based NMA (AB-NMA) challenging. To address these challenges and complement the analysis, we develop a novel sensitivity analysis tool tailored for AB-NMA: a tipping point analysis within the Bayesian framework, specifically targeting correlation parameters, to assess their influence on the robustness of conclusions about relative treatment effects, including changes in statistical significance and the magnitude of point estimates. Applying the analysis to multiple NMA datasets with 112 treatment pairs, we identified tipping points in 13 pairs (11.6%) for significance change, and in 29 pairs (25.9%) for magnitude change with a threshold at 15%. Our results underscore potential commonality in tipping points, emphasizing the necessity of our proposed analysis, especially in networks with sparse direct comparisons or wide credible intervals of estimated correlation. 

Keywords

network meta-analysis

correlation between multiple treatments

tipping point analysis

sensitivity analysis

robustness of research conclusion

statistical significance 

Abstracts


Co-Author(s)

Thomas Murray, University of Minnesota
Wenshan Han, Florida State University
Lifeng Lin
Lianne Siegel, University of Minnesota
Haitao Chu, Pfizer

First Author

Zheng Wang, University of Minnesota

Presenting Author

Zheng Wang, University of Minnesota

16 Using the truncated normal distribution for Bayes factors in hierarchical model selection.

In the identification of source problems within forensic science, the forensic examiner is tasked with providing a summary of evidence to allow a decision maker to evaluate the source of some evidence. The type of data encountered in the forensic identification of source problems often has a hierarchical structure, where there is a within and between source distribution for each object in a sample. One method of providing this summary of evidence is through a likelihood ratio (LR) or a Bayes factor (BF). With these methods, it is often the case that the two densities are estimated separately and then the ratio is reported, which can lead to instances where the resulting LR is large due to a small density in the denominator. In this work, we explore the use of the truncated normal distribution for use in LRs and BFs to attempt to alleviate this phenomenon. We also begin to characterize the robustness of these truncated normal LR methods. 

Keywords

forensic source identification

value of evidence

likelihood ratio

truncated normal distribution 

Abstracts


Co-Author(s)

Semhar Michael, South Dakota State University
Christopher Saunders, South Dakota State University

First Author

Dylan Borchert, South Dakota State University

Presenting Author

Dylan Borchert, South Dakota State University

6 Dir-SPGLM: A Bayesian Semiparametric GLM with Data-driven Reference Distribution

The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model. However, some inference summaries are not easily generated under existing maximum-likelihood based inference (ML-SPGLM). This includes uncertainty in estimation for model-derived functionals such as exceedance probabilities. The latter are critical in a clinical diagnostic or decision-making setting. In this article, by placing a Dirichlet prior on the baseline distribution, we propose a Bayesian model-based approach for inference to address these important gaps. We establish consistency and asymptotic normality results for the implied canonical parameter. Simulation studies and an illustration with data from an aging research study confirm that the proposed method performs comparably or better in comparison with ML-SPGLM. The proposed Bayesian framework is most attractive for inference with small sample training data or in sparse-data scenarios. 

Keywords

Ordinal regression

Nonparametric Bayes

Exceedance probabilities

Skewed Dirichlet

Dependent Dirichlet process 

Abstracts


Co-Author(s)

Paul Rathouz, University of Texas at Austin, Dell Medical School
Peter Mueller, UT Austin

First Author

Entejar Alam

Presenting Author

Entejar Alam

9 Implementation of Statistical Features of a Bayesian Two-armed RAR Trial

Bayesian adaptive designs with response adaptive randomization (RAR) have the potential to benefit more participants in a clinical trial. While there are many papers that describe RAR designs and results, there is a scarcity of works reporting the details of RAR implementation from a statistical point exclusively. In this paper, we introduce the statistical methodology and implementation of the trial Changing the Default (CTD). CTD is a single-center prospective RAR comparative effectiveness trial to compare opt-in to opt-out tobacco treatment approaches for hospitalized patients. The design assumed an uninformative prior, conservative initial allocation ratio, and a higher threshold for stopping for success to protect results from statistical bias. A particular emerging concern of RAR designs is the possibility that time trends will occur during the implementation of a trial. If there is a time trend and the analytic plan does not prespecify an appropriate model, this could lead to a biased trial. Adjustment for time trend was not pre-specified in CTD, but post hoc time-adjusted analysis showed no presence of influential drift. 

Keywords

drift analysis

comparative effectiveness trial

Bayesian adaptive designs 

Abstracts


Co-Author(s)

Kimber Richter, University of Kansas Medical Centre
Chuanwu Zhang, Sanofi
Laura Mussulman, University of Kansas Medical Centre
Niaman Nazir, University of Kansas Medical Centre
Byron Gajewski, University of Kansas Medical Center

First Author

Elena Shergina, University of Kansas Medical Center

Presenting Author

Elena Shergina, University of Kansas Medical Center