Current Developments in Nonparametric Bayes

Christopher Hans Chair
The Ohio State University
 
Monday, Aug 4: 2:00 PM - 3:50 PM
4070 
Contributed Papers 
Music City Center 
Room: CC-202B 

Main Sponsor

Section on Bayesian Statistical Science

Presentations

A Bayesian Semiparametric Approach to Conditional Survival Estimation for Heavy-Tailed Data Under Right-Censoring

This study explores the estimation of conditional survival function in heavy-tailed distributions under right-censoring, a prevalent issue in fields such as medical science. We introduce a novel Bayesian Semiparametric approach by combining a Dirichlet Process Mixture (DPM) model with the Generalized Pareto Distribution (GPD), enabling robust estimation of conditional survival functions using a unified model. The DPM model efficiently models the central portions of the distribution below a specified threshold, while the GPD addresses the tail behavior beyond this threshold. Our approach uniquely accommodates random right censoring and incorporates covariate information, enhancing the estimation of conditional survival and hazard functions tailored to specific covariates. This paper presents an inaugural development of Bayesian models in this area, along with simulation studies and real-data applications, demonstrating significant enhancements in the accuracy and reliability of conditional survival function estimations over traditional methods. 

Keywords

Bayesian Nonparametric

Dirichlet Process Mixture Model

Generalized Pareto Distribution

Right Censoring

Survival Curve Estimation

Extreme Quantile Estimation 

First Author

Arnab Aich

Presenting Author

Arnab Aich

WITHDRAWN Bayesian Tele-connected Spatial Clustering of Multivariate Spatial Data with Applications to Disease-Mapping

Spatial clustering is crucial in disease mapping by identifying subregions with different patterns of disease incidence or mortality.
This study proposes a novel Bayesian spatial clustering method for multivariate spatial disease data, which allows for understanding geographic variations of multivariate disease patterns while accounting for both spatial information and dependence among multiple disease measurements. We develop a new random tele-connected graph partition model with an unknown number of clusters, which is capable of encouraging locally contiguous clusters and allowing for remote subregions to be clustered together.
We use this prior in a Bayesian hierarchical model to detect spatial clusters and estimate cluster-specific disease patterns and dependence across the multivariate disease variables. We develop a tailored Markov chain Monte Carlo (MCMC) algorithm for posterior inference, utilizing efficient doubly split-merge samplers taking advantage of graph algorithms. We illustrate our method with simulation studies and apply it to investigate the clustering patterns of county-level prostate cancer mortality rate decline across six southern U.S. states from 1985 to 2014. 

Keywords

Inverse Wishart

Random Spanning Trees

Reversible-Jump MCMC

Spatial Clustering

Stirling Number of Second Kind 

Co-Author(s)

Huiyan Sang
Bani Mallick, Texas A&M University

First Author

Srijato Bhattacharyya, Texas A&M University

Bayesian Variable Selection for Ultra High-Dimensional Semiparametric Additive Partial Linear Models

Semiparametric regression models containing linear and nonlinear additive components generalize multiple linear regression models.We prefer them to fully nonparametric models when some covariates have linear effects .While variable selection for multiple linear regression has been widely studied,work on additive partial linear models(APLMs) are more recent.We develop a Bayesian group selection method for APLMs using splines to approximate the nonlinear functions.Our work is based on a hierarchical model with priors on regression coefficients,spline coefficients,and model space.We prove model selection consistency even when the number of predictors grow nearly exponentially with sample size.We propose a scalable algorithm for exploring gigantic model spaces and efficiently detecting regions of high posterior probabilities.Various simulation setups are used to evaluate and compare our proposed approach's performance with other available methods. Analyzing data from a genome-wide association study with 360 observations on a particular trait of plants as response and nearly a million SNPs and 30000 gene expressions as predictors demonstrate scalability and performance of our approach. 

Keywords

Genome wide association study

Hierarchical Model

Group selection

Stochastic Search

Additive Partial Linear Model

Posterior Prediction 

Co-Author(s)

Somak Dutta, Iowa State University
Vivekananda Roy, Iowa State University

First Author

Debarshi Chakraborty, Iowa State University

Presenting Author

Debarshi Chakraborty, Iowa State University

DPGLM: A Semiparametric Bayesian GLM with Inhomogeneous Normalized Random Measures

We introduce a varying weight dependent Dirichlet process (DDP) model to implement a semi-parametric GLM. The model extends a recently developed semi-parametric generalized linear model (SPGLM) by adding a nonparametric Bayesian prior on the baseline distribution of the GLM. We show that the resulting model takes the form of an inhomogeneous normalized random measure that arises from exponential tilting of a normalized completely random measure. Building on familiar posterior simulation methods for mixtures with respect to normalized random measures we introduce posterior simulation in the resulting semi-parametric GLM model. The proposed methodology is validated through a series of simulation studies and is illustrated using data from a speech intelligibility study. 

Keywords

Dependent Dirichlet process

Inhomogeneous normalized random measures

Density regression

Lévy-Khintchine representation

Semiparametric generalized linear model 

Co-Author(s)

Paul Rathouz, University of Texas at Austin, Dell Medical School
Peter Mueller, UT Austin

First Author

Entejar Alam, University of Texas at Austin

Presenting Author

Entejar Alam, University of Texas at Austin

Generalized Bayesian Additive Regression Trees for Restricted Mean Survival Time Inference

We introduce a generalized Bayes framework for predicting individual-level restricted mean survival times (RMST) without relying on strict survival model assumptions. Our method employs an RMST-targeted loss function using inverse probability of censoring weights (IPCW), enabling the handling of informative censoring by modeling only the censoring distribution. We incorporate a flexible additive tree regression model and construct pseudo-Bayesian posteriors via model-averaging IPCW-conditional loss functions. Through simulations and application to a multi-site breast cancer cohort, we demonstrate improved predictive performance over standard survival machine learning methods. Additionally, we will describe how this framework can be extended to perform dynamic RMST prediction. 

Keywords

dependent censoring, ensemble methods, Gibbs posterior, inverse weighting, loss function, survival analysis. 

Co-Author

Nicholas Henderson

First Author

Mahsa Ashouri, Miami University

Presenting Author

Mahsa Ashouri, Miami University

Predictor-Informed Bayesian Nonparametric Clustering

In this project we are performing clustering of observations such that the cluster membership is influenced by a set of covariates. To that end, we employ the Bayesian nonparameteric Common Atom Model (CAM), which is a nested clustering algorithm that utilizes a fixed group membership for each observation to encourage more similar clustering of members of the same group. CAM assumes each group has its own vector of cluster probabilities, which are themselves clustered to allow similar clustering for some groups. We extend CAM by treating the group membership as an unknown latent variable determined by the covariates. Thus, observations with similar predictor values will be in the same latent group and are more likely to be clustered together than observations with disparate predictors. We propose a Pyramid Group Model (PGM) that flexibly partitions the predictor space into these latent group memberships. The PGM operates similarly to a Bayesian CART process except that it uses the same splitting rule for at all nodes at the same tree depth. We propose a block Gibbs sampler for our model to perform posterior inference. Our methodology is demonstrated in simulation and real data. 

Keywords

Nonparamteric, Clustering, Covariates, Latent group-membership, Pyramid Group Model, Block Gibbs sampler, Simulations, Real data 

Co-Author

Jeremy Gaskins, University of Louisville

First Author

Md Yasin Ali Parh

Presenting Author

Md Yasin Ali Parh