Monday, Aug 5: 8:30 AM - 10:20 AM
5030
Contributed Speed
Oregon Convention Center
Room: CC-D135
Presentations
Community detection is a fundamental task in network analysis. Learning underlying network structures has brought deep insights into the understanding of complex systems. Real network data often arise via a series of interactions, with each interaction involving more than two nodes, e.g. multi-way interaction. The block components differ by different interactions. While many methods have focused on clustering nodes into blocks, few account for the fact that interactions may exhibit clustering as well. In this project, we introduce a Bayesian non-parametric framework to study multi-way interaction networks with joint modeling of latent node-level block labels and latent interaction-level labels. We will discuss challenges regarding the identifiability of latent labels in this framework and show the demonstration in simulated data. A Gibbs sampling-based algorithm is derived. We will conclude the presentation with the application of our proposed method to the Medicare claim data over the years and the potential medical implications for future.
Keywords
Community Detection in Network data
Bayesian non-parametric framework
Latent Class Model
Longitudinal studies and repeated measure are key in the study of correlated data. Irimata, Wilson 2017 presented a measure of these correlations when measuring the strength of association between an outcome of interest and multiple binary outcomes, as well as the clustering present due to correlation. They addressed the set of correlation in a hierarchical model with random effects. Estimation of parameters in such models is hampered by the association between time dependent binary variables and the outcome of interest. Wilson, Vazquez, Chen 2020 described marginal models in the analysis of correlated binary data with time dependent covariates. Their research addressed carryover effects on covariates and covariates unto responses through marginal models.
This research uses a random effects model with multiple outcomes to account for the changing impact of responses on covariates and covariates on response. It requires a series of distributions to address time-dependent covariates, as each random effect relies on a distribution. It differs from that of Wilson, Vazquez, Chen with their marginal models but is based on random effects in modeling (conditional model) feedback effects.
Keywords
Longitudinal studies
Correlation
Binary Models
In observational studies, presence of unmeasured confounders is a crucial challenge in accurately estimating desired causal effects. To calculate the hazard ratio (HR) in Cox proportional hazard models, instrumental variable methods such as Two-Stage Residual Inclusion (Martinez-Camblor et al., 2019) and Limited Information Maximum Likelihood (Orihara, 2022) are typically employed. However, these methods have several concerns, including the potential for biased HR estimates and issues with parameter identification. In this presentation, we introduce a novel nonparametric Bayesian method designed to estimate an unbiased HR, addressing concerns related to parameter identification. Our proposed method consists of two phases: 1) detecting clusters based on the likelihood of the exposure variable, and 2) estimating the hazard ratio within each cluster. Although it is implicitly assumed that unmeasured confounders affect outcomes through cluster effects, our algorithm is well-suited for such data structures. We will present simulation results to evaluate the performance of our method.
Keywords
general Bayes
instrumental variable
Mendelian randomization
nonparametric Bayes
unmeasured confounders
Mendelian randomization (MR) analysis is widely used in genetic epidemiology to estimate the causal effect of a risk factor on an outcome of interest. Increasing evidence shows the importance of sex differences in health and disease mechanisms. However, research on sex-specific causal effects is lacking due to limited sex-specific GWASs. Motivated by GWASs from the Million Veteran Program, in which only 10% of individuals are female, a major limitation to MR analyses is weak IVs, which manifest as poor variant-exposure effect estimates that lead to unstable causal effect estimates. We propose a Bayesian framework to stabilize female exposure GWAS effect sizes by borrowing information from the male population. By specifying a particular prior distribution on female exposure GWAS effect sizes, we demonstrate two special cases of posterior means, including the inverse variance-weighted meta-analysis and the adaptive weight approach. We perform a series of simulation studies to examine the performance of our proposed Bayesian approach in MR analysis. Finally, we apply the proposed method to estimate the causal effects of sleep phenotypes on cardiovascular-related diseases
Keywords
MR analysis
Bayesian framework
Sex-specific causal effect
Co-Author(s)
Nuzulul Kurniansyah, Department of Medicine, Brigham and Women’s Hospital
Daniel F Levey, Department of Psychiatry, Yale University School of Medicine
Joel Gelernter, Department of Psychiatry, Yale University School of Medicine
Jennifer Huffman, Center for Population Genomics, MAVERIC, VA Boston Healthcare System, Boston,
Kelly Cho, Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare
Peter Wilson, Division of Cardiology, Department of Medicine, Emory University School of Medicine
Daniel Gottlieb, Division of Sleep Medicine, Harvard Medical School
Kenneth Rice, University of Washington
Tamar Sofer, Beth Israel Deaconess Medical Center
First Author
Yu-Jyun Huang, Beth Israel Deaconess Medical Center
Presenting Author
Yu-Jyun Huang, Beth Israel Deaconess Medical Center
It is common to observe compositional data in various fields, with a growing interest in considering compositional data as outcomes in regression settings. The motivation for this paper stems from a study investigating the impact of sleep restriction on physical activity outcomes. The compositional outcomes were measured under both short sleep and healthy sleep conditions for the same participants. To address the dependence observed in the compositional outcomes, we introduce a Mixed-Effects Dirichlet Regression (MEDR) model. This model is designed to account for correlated outcomes arising from repeated measurements on the same subject or clustering within a group. We utilize an alternative parameterization of the Dirichlet distribution, enabling the modeling of both mean and dispersion components. Our approach offers Markov Chain Monte Carlo (MCMC) tools that are easily implementable in the programming languages Stan and R. We apply the proposed MEDR model to an experimental sleep study and illustrate its performance through simulation studies.
Keywords
Compositional data, Bayesian Dirichlet regression, Markov chain Monte Carlo, physical activity, sleep restriction.
Co-Author(s)
Xia Wang, University of Cincinnati
Nanhua Zhang, Cincinnati Children's Hospital Medical Center
First Author
Eric Odoom, University of Cincinnati
Presenting Author
Eric Odoom, University of Cincinnati
When our immune systems encounter foreign invaders, the B cells that produce our antibodies undergo a cyclic process of mutation and selection, competing to provide a refined immune response to the specific invader. To study how the immune system recognizes when the antibodies are sufficiently improved, we examine the state of the immune system in mice after an exposure to an artificial foreign agent by collecting genetic sequences of B cells. This experiment produces data only at one time point, so we lose all information about the preceding evolutionary process that mutates and selects B cells to optimize antibody efficiency. In this paper, we develop a multitype branching process model that integrates over unobserved antibody evolutionary histories and leverages parallel replications of immune responses we observed in experimentation. Our fully Bayesian approach, equipped with an efficient likelihood calculation algorithm and Markov chain Monte Carlo based approximation of the posterior, allows us to infer the currently-unknown functional relationship between the fitness of B cells that produce antibodies and the binding strength of these antibodies to pathogen-infected cells.
Keywords
immunology
phylogenetics
phylodynamics
stochastic processes
Group testing is a procedure that tests groups of biospecimens instead of individual ones. If a pool tests positive, subsequent tests are usually conducted on the individuals who contributed to the pool to determine their disease status; if a pool tests negative, all are considered disease-free. Under relatively low disease prevalence, group testing reduces required diagnostic tests and the associated costs. Spatio-temporal dependencies can arise in testing data collected across multiple locations and time points. However, existing group testing models are not appropriate for spatio-temporal data. In this study, we propose two Bayesian spatio-temporal regression models for discrete-time areal group testing data. We apply the proposed models to COVID-19 testing data from 4,516 South Carolina residents (2020-2022) and 19,152 Central New York residents (2020). Our models are suitable for various group testing protocols and can estimate the sensitivity and specificity of diagnostic tests. Moreover, the models also produce forecast maps for future infection prevalence. This study showcases the effectiveness of group testing in forecasting infectious diseases across different locations.
Keywords
group testing
Bayesian spatio-temporal model
infectious disease forecasting
conditional autoregressive model
vector autoregressive model
COVID-19
We construct a framework combining Gaussian processes and hierarchical modeling to estimate and emulate dark matter power spectra from multiple, dependent computer model simulations. We model the spectra as deep Gaussian processes, and consider multiple candidate models for the covariance structure of the simulations' deviations from the true spectra. Applying the best candidate model to the expensive simulations, we estimate the underlying power spectrum for a given cosmology. With these estimates calculated across multiple cosmologies, we build an emulator using functional principal components (and Gaussian processes on the weights) for unobserved cosmologies. We obtain promising results comparing against an existing method.
Keywords
Gaussian processes
Bayesian modeling
Hierarchical modeling
Deep Gaussian processes
Cosmology
Optimal dynamic treatment regimes (DTR) are sequences of decision rules aimed at determining the sequence of treatments tailored to patients, maximizing a long-term outcome. While conventional DTR estimation uses longitudinal data, there is little work on devising methods that use irregularly observed data to infer optimal DTRs. In this work, we first extend the target trial framework -- a paradigm to estimate specified statistical estimands under hypothetical scenarios using observational data -- to the DTR context; this extension allows treatment regimes to be defined with intervenable visit times. We propose an adapted version of G-computation marginalizing over random effects for rewards that encapsulate a treatment strategy's value. To estimate components of the G-computation formula, we then articulate a Bayesian joint model to handle correlated random effects between the outcome, visit and treatment processes. We also extend this model to allow flexible specifications of the random effects' distribution. Lastly, we show via simulation studies that failure to account for the observational treatment and visit processes produces bias in the estimation of regime rewards.
Keywords
Dynamic treatment regime
Bayesian joint modelling
Target Trial Framework
G-computation
Irregularly observed data
Place-based epidemiology studies often rely on circular buffers to define exposure at spatial locations. Buffers are a popular choice due to their simplicity and alignment with public health policies. However, the buffer radius is often chosen relatively arbitrarily and assumed constant across space, which may result in biased effect estimates if these assumptions are violated. To address these limitations, we propose a novel method to inform buffer size selection and allow for spatial heterogeneity in radii across outcome units. Our model uses a spatially structured Gaussian process to model buffer radii as a function of covariates and spatial random effects, and a modified Bayesian variable selection framework to select the most appropriate radius distance. We perform a simulation study to understand the properties of our new method and apply our proposed method to a study of health care access and health outcomes in Madagascar. We find that our method outperforms existing approaches in terms of estimation and inference for key model parameters. By relaxing rigid assumptions about buffer characteristics, our method offers a flexible, data-driven approach to exposure definition.
Keywords
Bayesian methods
exposure buffers
geographic and spatial uncertainty
place-based epidemiology
health studies
Bayesian variable selection (BVS) is a powerful tool in high-dimensional settings, as it incorporates prior information and facilitates model selection simultaneously. However, the potential of side information, such as previous studies or expert knowledge, to identify influential variables is often underutilized in BVS applications. For example, in a study of genetic markers of nicotine metabolite ratio p-values from previous studies are available. These p-values may be useful in determining the sparsity structure of regression coefficients, and enhance the accuracy of model results. Under the mean-field assumption, employing a spike-and-Gaussian-slab prior, variational Bayesian (VB) with the coordinate ascent variational inference (CAVI) algorithm can be used to approximate the posterior distributions. To integrate side information into variable selection, we augment our sparse linear regression model with a conditional logistic model on the impact of the side information on the variable selection indicators. In this enhanced framework, the logistic VI predominantly governs the prior inclusion probability within the spike-and-slab prior. Our simulation studies suggest that incorp
Keywords
Bayesian variable selection
side information
variational inference
Predictive probability of success, or PPoS, is a crucial decision-making tool that predicts trial success and is computed at various phases of the drug development process. We propose a Dirichlet Process meta-analytic prior (DP-MAP), a non-parametric approach to account for the statistical heterogeneity among the treatment effects across all the historical studies considered for constructing an informative prior, for calculating PPoS. It allows for a more robust inference in the case of prior-data conflict. As the basic premise is to borrow only if the historical information is relevant, some prior trials may concur with or disagree with the current data. DP provides a flexible solution; that is, DP offers the chance to borrow from earlier trials based on their similarity with the current trial and resolves the prior data conflict.
In this paper, we assess the model fit of DP-MAP prior and compare it with the model fit for both the standard meta-analytic predictive prior (MAP) and robust-meta-analytic prior (rMAP) approaches. We utilize a real data example from historical RRMM trials and demonstrate PPoS calculations at the design stage and interim analysis of the ongoing trial.
Keywords
Dirichlet process prior
Predictive probability of success
interim analysis
clinical trials
Bayesian statistics
go-no go decision
Finite Gaussian mixture models are ubiquitous for model-based clustering of continuous data. These models' parameters scale quadratically with the number of variables. A rich literature exists on parsimonious models via covariance matrix decompositions or other structural assumptions. However, these models do not allow for direct estimation of conditional independencies via sparse precision matrices. Here, we introduce mixtures of Gaussian graphical models for model-based clustering with sparse precision matrices. We employ recent developments in Bayesian estimation of Gaussian graphical models to circumvent the doubly intractable partition function of the G-Wishart distribution and use conditional Bayes factors for model comparison in a Metropolis-Hastings framework. We extend this to mixtures of Gaussian graphical models and apply this to estimate conditional independence structures in the different mixture components via fast joint estimation of the graphs and precision matrices. Our framework results in a parsimonious model-based clustering of the data and provides conditional independence interpretations of the mixture components.
Keywords
Model-based clustering
Finite Gaussian mixture models
Precision matrix
Gaussian graphical model
Markov chain Monte Carlo (MCMC)
G-Wishart Distribution
Constraints on parameter spaces promote various structures in Bayesian inference. Simultaneously, they present methodological challenges, such as efficiently sampling from the posterior. While recent work has tackled this important problem through various approaches of constraint relaxation, much of the underlying machinery assumes the parameter space is Euclidean-an assumption that doesn't hold in many settings. Building on the recently proposed class of distance-to-set priors (Presman and Xu, 2023), this talk explores extensions of constraint relaxation in non-Euclidean spaces. We propose a natural extension of these priors, which we call (Bregman) divergence-to-set priors, exemplify many settings where they can be leveraged, and demonstrate how techniques originally from an optimization algorithm known as mirror descent can utilized for non-Euclidean Bayesian constraint relaxation.
Keywords
Constraint relaxation
Hamiltonian Monte Carlo
Bregman divergence
MCMC Sampler
The multinomial probit model is a popular tool for examining nominal categorical data. However, due to the model's identification issue which requires restricting the first element of the covariance matrix of the latent variables, it poses a daunting challenge for researchers to develop efficient Markov chain Monte Carlo (MCMC) methods. The parameter-expanded data augmentation (PX-DA) is a well-known technique that introduces a working/artificial parameter or parameter vectors to transform an identifiable model into a non-identifiable one. This transformation can improve the mixing and convergence of the data augmentation components. Hence, we propose a PX-DA algorithm to analyze the categorical data using multinomial probit models. We examine both identifiable and non-identifiable multinomial probit models and develop the corresponding MCMC algorithms. The constructed non-identifiable model successfully bypasses a Metropolis-Hastings algorithm for sampling the covariance matrix, resulting in enhanced convergence and improved mixing of the MCMC components. We conduct simulation studies to demonstrate our proposed methods and apply them to the real data from the Six Cities study.
Keywords
multinomial probit model
latent variable
parameter-expanded
data augmentation
MCMC
non-identifiable model
During the COVID-19 outbreak, the global community encountered numerous challenges, underscoring the necessity for effective prediction models to inform public health interventions and optimize resource allocation. Traditional compartmental models like the SIR (Susceptible-Infected-Recovered) model and its variants have been employed to predict disease prevalence. However, these models have limitations; they struggle to detect multiple waves and are sensitive to initial parameters, necessitating time-consuming parameter tuning. In this study, we propose an approach to identify multi-wave patterns in COVID-19 cases. Our method involves utilizing Bayesian changepoint detection to identify multiple waves, followed by the application of a logistic growth model to estimate daily COVID-19 cases, including hospitalizations and ICU patients. We evaluate the model's accuracy using Mean Absolute Percentage Errors (MAPE).
Keywords
SIR model
Bayesian changepoint
Mean Absolute Percentage Errors
Especially when facing reliability data with limited information (e.g.,
a small number of failures), there
are strong motivations for using Bayesian inference methods.
These include the option to use information
from physics-of-failure or previous experience with a failure mode
in a particular material to specify an informative
prior distribution. Another advantage is the ability
to make statistical inferences without
having to rely on specious (when the number of failures is small)
asymptotic theory needed to justify
non-Bayesian methods. Users of non-Bayesian methods are faced with
multiple methods of constructing uncertainty intervals (Wald,
likelihood, and various bootstrap methods) that can give
substantially different answers when there is little information in
the data. For Bayesian inference, there is only one method---but
it is necessary to provide a prior distribution to fully specify the model.
This presentation reviews some of this work and provides, evaluates, and illustrates principled
extensions and adaptations of these methods to the practical
realities of reliability data (e.g., non-trivial censoring).
Keywords
Bayesian inference
default prior
Reliability
few failures
noninformative prior
reference prior