Methods for multivariate response data with complex exposure-response relations

Ander Wilson Chair
Colorado State University
 
Ander Wilson Organizer
Colorado State University
 
Elena Colicino Organizer
Mount Sinai School of Medicine
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
0821 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-207D 

Applied

Yes

Main Sponsor

Biometrics Section

Co Sponsors

Section on Statistics in Epidemiology
WNAR

Presentations

A Bayesian Functional Concurrent Zero-Inflated Dirichlet Multinomial Regression Model

The human microbiome undergoes dynamic shifts in composition over time, exemplified by rapid changes in newborns or by shifts following dietary changes. Modeling the influence of exposures or treatments on microbial composition over time is essential to understanding the factors that drive these transitions. Often, Dirichlet-multinomial (DM) regression models are used to investigate the potential relation between observed covariates and microbial data due to its ability to accommodate potential overdispersion in and compositional structure of the data. However, traditional DM regression models are not equipped to handle repeated measures data, ignore potential zero-inflation that is characteristic of microbiome data, and assume the effect of a covariate is constant throughout the study period. Additionally, alternative methods for modeling longitudinal microbiome data often overlook the compositional structure of the data and time varying effects. To fill these gaps, we propose a functional concurrent zero-inflated Dirichlet-multinomial (FunC-ZIDM) regression model, which is designed to model time-varying relations between observed covariates and microbial taxa while accounting for zero-inflation, compositional structure, and repeated measures. Through simulation, we demonstrate the model's ability to estimate the relative abundance of compositional elements and to scale to large compositional spaces. We apply our model to investigate time-varying associations between infants' microbial composition and both breast milk intake and gestational age at birth during the 11-week postnatal period. 

Co-Author(s)

Ander Wilson, Colorado State University
Matthew Koslovsky, Colorado State University

Speaker

Brody Erlandson

A multiple imputation method for compositional microbiome data

High sparsity (i.e., excessive zeros) in microbiome data is unavoidable and can significantly alter analysis results. However, efforts to address this high sparsity have been limited, in part because it is impossible to justify the validity of any such methods, as zeros in microbiome data can arise from multiple sources. In this study, we first demonstrate theoretically and empirically that treating all zeros as missing values is a more robust approach than treating them as structural zeros (i.e., true absence) or rounded zeros (i.e., undetected due to detection limit), when the source of zeros is unknown. We then introduce a novel multiple imputation method developed specifically for high-sparse, high-dimensional compositional data. The robustness of the proposed approach, along with its beneficial effects on downstream analyses, is demonstrated through extensive simulation studies. Finally, we reanalyzed a type II diabetes (T2D) dataset to determine differentially abundant species between T2D patients and non-diabetic controls. 

Keywords

Excess zeros

Composition

High dimension

Microbiome

Multiple imputation 

Speaker

Michael Sohn, University of Rochester

Bayesian joint modelling for high-dimensional network mediation analysis

Causal mediation analysis provides critical insights into how exposures influence outcomes through intermediate variables, or mediators. In this study, we examine mediation effects in complex-structured data, focusing on brain connectivity networks derived from fMRI. Capturing these mediation pathways is essential for understanding neurobiological mechanisms, yet the high dimensionality of brain connectivity data presents challenges for traditional mediation methods. To address this, we apply manifold learning techniques to project high-dimensional connectivity matrices onto lower-dimensional latent spaces, preserving node-level characteristics and facilitating the identification of key mediating brain regions. Additionally, we leverage a joint sampling strategy within a Bayesian framework to retain mediator-specific features while effectively handling sparsity and complexity in the data. These methodological advancements enhance causal inference by improving mediation effect estimation and providing deeper insights into the pathways linking exposures to outcomes. This work contributes to advancing mediation analysis for complex neuroimaging data. 

Keywords

Bayesian Modelling

Causal Mediation Analysis

Brain connectivity network

Dimension reduction 

Speaker

Jingyan Fu, Rice University

Compositional Outcomes and Environmental Chemical Mixtures: the Dirichlet-Bayesian Weighted Quantile Sum Regression

Environmental mixture approaches currently struggle to accommodate compositional outcomes, consisting of vectors constrained onto the unit simplex. This limitation poses challenges in effectively evaluating the associations between multiple concurrent environmental exposures and their respective impacts on the outcomes. As a result, there is a pressing need for the development of analytical methods that can more accurately assess the complexity of these relationships.
Here, we extend the Bayesian weighted quantile sum regression (BWQS) framework for jointly modeling compositional outcomes and environmental mixtures using a Dirichlet distribution with a multinomial logit link function. The proposed approach, named Dirichlet-BWQS (D-BWQS), allows for the simultaneous estimation of mixture weights associated with each exposure mixture component as well as the association between the overall exposure mixture index and each of the outcome proportions. 

Speaker

Elena Colicino, Mount Sinai School of Medicine

Pathway-Aware Low-Rank Factorization and Regression for Interpretable Multi-Omics Analysis

Multi-omics studies now profile complementary molecular layers -genome, transcriptome, proteome, and metabolome- in the same biospecimens, generating massive matrices whose joint structure encodes biological regulation. Low-rank factor models are a proven tool for distilling such high-dimensional data into interpretable molecular modules, yet current approaches typically analyze one omics layer at a time or look for interactions between pairs of them. This omission sacrifices both statistical power and biological plausibility.‎
We propose an advanced matrix factorization framework that seamlessly integrates overlapping pathway annotations while co-decomposing multiple omics matrices. Methodological novelties include (i) an interaction-aware group sparsity penalty that encourages factors to respect partially overlapping pathways defined for each omics layer and induces sign consistency on every selected pathway, and (ii) a factor-level false discovery rate control strategy based on stability selection, delivering finite-sample guarantees on module reproducibility while balancing the contribution of each view.‎
Through extensive simulations reflecting realistic pathway overlap, our method improves estimation efficiency.‎
An open-source R implementation built on high-performance C++ (Armadillo) back-end facilitates deployment to single-omics, multi-omics, or phenotype-association studies, and the framework naturally extends to multivariate regression for overlapping feature and outcome selection. By embedding pathway knowledge into multi-omics factorization, our approach advances both interpretability and statistical power in contemporary molecular biology.‎ 

Keywords

LOW-RANK

FACTOR ANALISIS

OVERLAPPING CLUSTERING

PATHWAY ANNOTATION

PENALIZATION

OPTIMIZATION 

Co-Author

Siyuan Ma

Speaker

Eric Koplin, Vanderbilt University