Sunday, Aug 3: 4:00 PM - 5:50 PM
0647
Topic-Contributed Paper Session
Music City Center
Room: CC-209A
This session presents statistical methodologies for analyzing health and microbiome data. The talks will cover statistical techniques for longitudinal omics and health data analysis, integration of taxonomic and functional profiles for biomarker identification, microbiome data meta-analysis, and causal microbial biomarker identification. Finally, we will discuss how deep data, and remote monitoring can transform healthcare. Together, these presentations highlight the latest advancements in computational tools and interdisciplinary strategies, aiming to enhance our understanding of complex biological systems and improve health outcomes through precision medicine.
Applied
No
Main Sponsor
Section on Statistics in Genomics and Genetics
Co Sponsors
Biometrics Section
ENAR
Presentations
Accurate identification of microbial biomarkers is hindered by the unique characteristics of microbiome data, which often result in excessive false positives and reduced statistical power. To address these challenges, we introduce Zinck, a knockoff-based feature selection framework equipped with a high-fidelity knockoff generator. Zinck effectively captures key properties of microbiome data, including zero inflation, complex correlation structures, high variability, and strong batch effects. Through simulations, we demonstrate Zinck's superior statistical power and its robust control of false positives. In real data applications, Zinck successfully identifies biologically relevant microbial biomarkers for colorectal cancer and inflammatory bowel disease, significantly enhancing disease prediction accuracy.
Keywords
microbiome data
knockoff filters
FDR control for high-dimensional data
compositional data
Recently, the microbiome has gained significant attention as a potential predictor of human diseases. However, identifying robust, validated, and powerful microbial biomarkers remains challenging due to the complexity of microbiome data, including both taxonomic and functional profiles. Studies have shown that taxonomic profiles typically offer greater predictive performance and are easier to apply in practical and clinical settings but exhibit higher variability. In contrast, functional profiles are more stable and interpretable in terms of biological mechanisms but tend to have lower predictive performance. In this study, we propose a robust microbial risk score (MRS) framework that integrates both taxonomic and functional profiles to identify a microbial sub-community capable of serving as biomarkers for disease susceptibility. Specifically, we first identify a sub-community of microbial taxa associated with disease using the taxonomic profile, following a similar approach to our MRS version 1. We then expand this sub-community by incorporating additional microbial taxa based on their functional similarities with the identified taxa and calculate the weighted diversities of the sub-community as the proposed MRSs. Through comprehensive real-data analyses using human microbiome datasets from the curatedMetagenomicData R package, we demonstrate the utility of the proposed MRS framework for disease prediction. Moreover, the incorporation of functional profiles can be seamlessly integrated into other predictive methods, such as random forests, to enhance predictive performance.
Keywords: Microbial risk score; taxonomic profile; functional profile; disease prediction.
Co-Author
Huilin Li, New York University
Speaker
Chan Wang, New York University, School of Medicine
Identifying key microbial features associated with clinical outcomes, host factors, and other covariates is central to advancing microbiome research. Recent microbiome association studies have scaled up significantly, incorporating a greater number of microbial features, covariates, and datasets from diverse populations and cohorts. The unique characteristics of microbiome data pose significant challenges for association analysis, particularly in large-scale and meta-analytic contexts, often leading to low replicability of findings. To address these challenges, we introduce PALM, a semi-parametric statistical framework designed for robust, scalable, and generalizable microbiome association discovery in large-scale studies and meta-analyses. Extensive realistic simulations demonstrate PALM's advantages in false discovery rate control, statistical power, computational efficiency, and cross-study signal harmonization. PALM's utility is illustrated through real data applications.
Keywords
microbiome
association analysis
meta-analysis
The transition from health to chronic disease inevitably involves changes in individual clinical biomarkers. These changes may be small when considered on a value-by-value basis over a short period of time; however, the aggregate signal of multiple measures may provide a practical way of detecting relevant change early and serve as an indicator for preventive interventions. Existing evidence suggest the importance of dynamics of evolution of omics and clinical measurements in biomedical data. This motivates the need for robust methodologies for modeling and inference with such longitudinal data. We have recently introduced a novel multivariate approach to longitudinal microbiome data analysis, multivariate distance drift-diffusion framework (MD3F). This framework allows to summarize multivariate trends over time in omics and health data. In this talk, we give examples of application of MD3F to microbiome and clinical laboratory data. We further discuss approaches for testing population differences in multivariate drift.