Sunday, Aug 3: 4:00 PM - 5:50 PM
4019
Contributed Papers
Music City Center
Room: CC-205C
This session will showcase latest genomics, microbiome, metabolics, and sequencing research with more Bayesian methodogy incorporated into these research areas being presented.
Main Sponsor
Biometrics Section
Presentations
In microscopic images of cells, various cell populations often co-exist in a particular tissue, forming highly spatially structured communities where different taxa interact at micrometer scales. Quantifying the spatial relationships of microbes is essential for uncovering biofilm functions and biological mechanisms. Multivariate log Gaussian Cox processes are flexible models for the analysis of multivariate point patterns. However, they have so far been focused on single realizations only (i.e. single images), ignoring similarity and dissimilarity across images. We move beyond this limitation to model spatial interactions among multiple object types, integrating multi-level images from multiple subjects. Particularly, we propose a unified hierarchical multivariate log Gaussian Cox process framework for multi-level image data from multiple subjects with a global governing process, providing a comprehensive quantification of the multivariate spatial relationships among object types. The proposed framework is appealing due to the ability to quantify both within-sample and across-sample variability and to derive global and subject-level inter-type spatial relationships simultaneously.
Keywords
Microbiome Biofilm Image
Cross-pair Correlation
Log Gaussian Cox Process
Multivariate Point Process
Spatial Ecology
Examining the tumor immune microenvironment (TIME) has been revolutionized by advancements in spatial proteomic imaging techniques. These techniques assess multiple markers simultaneously to differentiate different immune cell populations in the TIME. The analysis of these immune profiles has become increasingly significant with the progress of immunotherapy treatments. The over-dispersed nature of the cell count data is accounted for by modeling the count data using a beta-binomial distribution. To account for the correlation between the different cell populations in the TIME (i.e., T cells and Cytotoxic T cells), we developed a Bayesian hierarchical beta-binomial model. The Bayesian model can incorporate different covariance (or relationship) structures between the different immune cell populations to incorporate immune differentiation paths. To illustrate the Bayesian model and different covariance structures that are possible, the model is applied to spatial proteomic data from three large epidemiologic cohorts (N = 486) looking at the TIME of ovarian cancer.
Keywords
Bayesian
beta-binomial model
covariance structures
hierarchical
spatial protein imaging data
tumor immune microenvironment
Co-Author(s)
Alex Soupir, Biostatistics and Bioinformatics Shared Resource, Moffitt Cancer Center
Mary Townsend, Division of Oncological Sciences, Knight Cancer Institute Oregon Health and Science University
Jose Laborde, Moffitt Cancer Center
Courtney Johnson, Emory University
Andrew Lawson, Medical University of South Carolina, College of Medicine
Joellen Schildkraut, Emory University
Shelley Tworoger, Moffitt Cancer Center
Kathryn Terry, Brigham and Women’s Hospital and Harvard Medical School
Lauren Peres, Moffitt Cancer Center
Brooke Fridley, Children's Mercy
First Author
Chase Sakitis, Children's Mercy
Presenting Author
Chase Sakitis, Children's Mercy
The gut microbiome influences cancer therapy responses, particularly immunotherapies, by shaping the metabolome. While some studies examine specific microbial genera and metabolites, little work identifies key genera driving overall metabolome profiles. To address this, we introduce B-MASTER (Bayesian Multivariate Analysis for Selecting Targeted Essential Regressors), a fully Bayesian framework with L1 and L2 penalties for sparsity and shrinkage, paired with a scalable Gibbs sampler. B-MASTER enables full posterior inference for models with up to four million parameters efficiently. Using this approach, we identify key microbial genera shaping metabolite profiles and analyze their relevance to colorectal cancer.
Keywords
Bayesian penalized regression,
Gibbs sampling
Scalable high-dimensional models
Microbiome-metabolites dynamics
Colorectal cancer.
High-throughput sequencing technologies in microbiome, transcriptome, and genome studies have produced massive omics datasets, where the primary outcomes are either count data (e.g., RNA-seq) or relative abundance data (e.g., microbial taxa proportions). We aim to model such data collected in longitudinal studies. Unlike time-course (time series) data, which track realizations of stochastic processes, longitudinal data are sparse and subject-specific. Biomarker interactions—such as correlated metabolites in diabetes studies—can enhance detection power. However, fully multivariate models for serial measurements pose high-dimensional estimation challenges. A practical alternative for univariate outcomes is to incorporate random effects into fixed-effect models, such as linear or generalized linear mixed models (GLMMs). A widely adopted approach employs the negative binomial distribution to account for overdispersion in count data. However, this model is inappropriate for relative abundance data, which are continuous, non-negative, and often zero-inflated—violating the discrete nature assumed by the negative binomial distribution.
Meanwhile, the widely used Benjamini-Hochberg $p$-value adjustment addresses the multiple-testing burden in high-dimensional settings but does not yield an estimation or predictive model. Thus, there is a clear need for efficient GLMM estimation techniques in high-dimensional contexts—an area previously addressed in the literature, but typically under normality assumptions or limited to select distributions from the exponential dispersion family.
In most omics applications, microbiome, transcriptome, and genome data are normalized by total count, resulting in relative abundance values. These values lie in [0,1] and reflect compositional rather than raw count data. Modeling such data with a negative binomial distribution violates key assumptions, misrepresents zeros caused by detection limits or true absence, and fails to account for compositional constraints or batch effects that influence library size. Moreover, omics datasets often exhibit sparsity (high proportions of zeros) and skewness, particularly due to inter-sample variability, sequencing depth, and preprocessing thresholds. These characteristics necessitate statistical models capable of handling both zero-inflation and continuous positive values.
To address these challenges, we assume that the $j$th measurement for subject $i$, conditional on the random effects, follows a Tweedie distribution with mean $\mu_{ij}$, and unknown dispersion, and Tweedie index parameters. The mean is linked to both fixed and random effects via a log link function.
A major obstacle in applying standard LASSO to omics-scale data is computational inefficiency. We instead perform regularized quasi-likelihood estimation using $l_1$ regularization within a Bayesian framework. We assume that each regression coefficient follows a double-exponential (Laplace) prior, such that the maximum a posteriori (MAP) estimate under the quasi-likelihood corresponds to a regularized quasi-maximum likelihood solution. To address scalability issues, we implement an efficient MCMC algorithm that leverages posterior sampling to improve computational performance. Unlike standard least-squares or penalized likelihood approaches—which often fail under high dimensionality and zero-inflation—our MCMC method accommodates large covariate spaces, efficiently explores the posterior distribution under non-Gaussian outcomes, and ensures robust convergence even in the presence of singularities.
We benchmark our method through simulations that evaluate bias, sparsity recovery, and convergence across varying degrees of zero-inflation and sequencing depth. We also apply our method to a real transcriptomic dataset with associated treatment and clinical metadata, demonstrating improved model fit and interpretability compared to negative binomial-based models.
Keywords
Bayesian lasso
compound Poisson distribution
generalized linear mixed model
longitudinal omics data
Tweedie family
Considerable progress has been made in quantifying the heritability of cross-sectional traits, but analyzing longitudinal phenotypic trajectories remains challenging. This study introduces a mixed model integrating genome-wide genetic variants to disentangle heritability metrics on baseline trait levels and rates of change over time, providing insights into both static and dynamic aspects of traits. Key challenges primarily stem from the potential for large-scale studies, truncated estimates due to limited measurements per subject, joint genetic effects. To address these complexities, we compare the average information restricted maximum likelihood algorithm, augmented with meta analysis to tackle truncation, with the restricted Haseman-Elston regression approach, which avoids reliance on precision matrix computations. Using these approaches, we analyzed 6,948,674 genome-wide common variants to study PSA trajectories in males from the PLCO Screening Trial. Our findings reveal moderate heritability of baseline PSA levels but significant heritability of PSA velocity, underscoring an increasing heritability trend with age and enabling more accurate prediction of disease risk.
Keywords
AI-REML algorithm
truncation
REHE method
heritability
PSA level
large-scale studies
The study of protein–protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. Graphical models are useful tools for understanding complex biological relationships between biomolecules in high-dimensional data. Nevertheless, their current usability is limited, particularly in a Bayesian estimation paradigm when handling multiclass large datasets, particularly in the field of biology, due to computational limitations. Here, we introduce a clustering-focused iterative (CFI) methodology designed to enhance the scalability and accuracy of multiple Gaussian Graphical Model (GGM) estimation in high-dimensional spaces. Further, we present a framework for a Bayesian graphical model which allows for group-specific prior distribution specification leading to improved model accuracy. We present results from simulation studies as well as a real-world application to data from host-response mass spectrometry studies.
Keywords
graphical model
Bayesian
omics data
Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of two canonical assumptions: (1) a homogeneous graph with a common network for all subjects or (2) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail in certain applications such as proteomic networks. We propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality by random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity via graphical regressions. We formulate a new characterization of dependencies, conditional sign independence with covariates, with an efficient sampler. Simulation studies show that rBGR outperforms existing graphical models for data from various levels of non-normality in both edge and covariate selection. We use rBGR to access proteomic networks and find protein-protein interactions that are differentially associated with immune cell abundance.
Keywords
Bayesian graphical models
Cancer
Conditional sign independence
Covariate-dependent graphs
Protein-protein interactions
First Author
Tsung-Hung Yao, The University of Texas MD Anderson Cancer Center
Presenting Author
Tsung-Hung Yao, The University of Texas MD Anderson Cancer Center