Sunday, Aug 3: 4:00 PM - 5:50 PM
0821
Topic-Contributed Paper Session
Music City Center
Room: CC-207D
Applied
Yes
Main Sponsor
Biometrics Section
Co Sponsors
Section on Statistics in Epidemiology
WNAR
Presentations
The human microbiome undergoes dynamic shifts in composition over time, exemplified by rapid changes in newborns or by shifts following dietary changes. Modeling the influence of exposures or treatments on microbial composition over time is essential to understanding the factors that drive these transitions. Often, Dirichlet-multinomial (DM) regression models are used to investigate the potential relation between observed covariates and microbial data due to its ability to accommodate potential overdispersion in and compositional structure of the data. However, traditional DM regression models are not equipped to handle repeated measures data, ignore potential zero-inflation that is characteristic of microbiome data, and assume the effect of a covariate is constant throughout the study period. Additionally, alternative methods for modeling longitudinal microbiome data often overlook the compositional structure of the data and time varying effects. To fill these gaps, we propose a functional concurrent zero-inflated Dirichlet-multinomial (FunC-ZIDM) regression model, which is designed to model time-varying relations between observed covariates and microbial taxa while accounting for zero-inflation, compositional structure, and repeated measures. Through simulation, we demonstrate the model's ability to estimate the relative abundance of compositional elements and to scale to large compositional spaces. We apply our model to investigate time-varying associations between infants' microbial composition and both breast milk intake and gestational age at birth during the 11-week postnatal period.
High sparsity (i.e., excessive zeros) in microbiome data is unavoidable and can significantly alter analysis results. However, efforts to address this high sparsity have been limited, in part because it is impossible to justify the validity of any such methods, as zeros in microbiome data can arise from multiple sources. In this study, we first demonstrate theoretically and empirically that treating all zeros as missing values is a more robust approach than treating them as structural zeros (i.e., true absence) or rounded zeros (i.e., undetected due to detection limit), when the source of zeros is unknown. We then introduce a novel multiple imputation method developed specifically for high-sparse, high-dimensional compositional data. The robustness of the proposed approach, along with its beneficial effects on downstream analyses, is demonstrated through extensive simulation studies. Finally, we reanalyzed a type II diabetes (T2D) dataset to determine differentially abundant species between T2D patients and non-diabetic controls.
Keywords
Excess zeros
Composition
High dimension
Microbiome
Multiple imputation
Causal mediation analysis provides critical insights into how exposures influence outcomes through intermediate variables, or mediators. In this study, we examine mediation effects in complex-structured data, focusing on brain connectivity networks derived from fMRI. Capturing these mediation pathways is essential for understanding neurobiological mechanisms, yet the high dimensionality of brain connectivity data presents challenges for traditional mediation methods. To address this, we apply manifold learning techniques to project high-dimensional connectivity matrices onto lower-dimensional latent spaces, preserving node-level characteristics and facilitating the identification of key mediating brain regions. Additionally, we leverage a joint sampling strategy within a Bayesian framework to retain mediator-specific features while effectively handling sparsity and complexity in the data. These methodological advancements enhance causal inference by improving mediation effect estimation and providing deeper insights into the pathways linking exposures to outcomes. This work contributes to advancing mediation analysis for complex neuroimaging data.
Keywords
Bayesian Modelling
Causal Mediation Analysis
Brain connectivity network
Dimension reduction
Environmental mixture approaches currently struggle to accommodate compositional outcomes, consisting of vectors constrained onto the unit simplex. This limitation poses challenges in effectively evaluating the associations between multiple concurrent environmental exposures and their respective impacts on the outcomes. As a result, there is a pressing need for the development of analytical methods that can more accurately assess the complexity of these relationships.
Here, we extend the Bayesian weighted quantile sum regression (BWQS) framework for jointly modeling compositional outcomes and environmental mixtures using a Dirichlet distribution with a multinomial logit link function. The proposed approach, named Dirichlet-BWQS (D-BWQS), allows for the simultaneous estimation of mixture weights associated with each exposure mixture component as well as the association between the overall exposure mixture index and each of the outcome proportions.
Multi-omics studies now profile complementary molecular layers -genome, transcriptome, proteome, and metabolome- in the same biospecimens, generating massive matrices whose joint structure encodes biological regulation. Low-rank factor models are a proven tool for distilling such high-dimensional data into interpretable molecular modules, yet current approaches typically analyze one omics layer at a time or look for interactions between pairs of them. This omission sacrifices both statistical power and biological plausibility.
We propose an advanced matrix factorization framework that seamlessly integrates overlapping pathway annotations while co-decomposing multiple omics matrices. Methodological novelties include (i) an interaction-aware group sparsity penalty that encourages factors to respect partially overlapping pathways defined for each omics layer and induces sign consistency on every selected pathway, and (ii) a factor-level false discovery rate control strategy based on stability selection, delivering finite-sample guarantees on module reproducibility while balancing the contribution of each view.
Through extensive simulations reflecting realistic pathway overlap, our method improves estimation efficiency.
An open-source R implementation built on high-performance C++ (Armadillo) back-end facilitates deployment to single-omics, multi-omics, or phenotype-association studies, and the framework naturally extends to multivariate regression for overlapping feature and outcome selection. By embedding pathway knowledge into multi-omics factorization, our approach advances both interpretability and statistical power in contemporary molecular biology.
Keywords
LOW-RANK
FACTOR ANALISIS
OVERLAPPING CLUSTERING
PATHWAY ANNOTATION
PENALIZATION
OPTIMIZATION