09. Comparing preprocessing methods on inference of exposomic and metabolomic data with application to liver disease health outcomes in a clinical study

Conference: Women in Statistics and Data Science 2025
11/13/2025: 2:30 PM - 4:00 PM EST
Speed 

Description

Exposomics and metagenomics data, like other high throughput data, require preprocessing before further downstream statistical analysis may occur. One step includes normalization, which is often accomplished with standard normalization techniques (e.g., quantile, sum, median, reference sample or reference feature) readily available in specialized software (e.g., Metaboanalyst). Analysis of exposomics and metabolomics in larger cohorts will continue to increase due to lowered costs, and include a richer set of clinical and patient characteristics that may be beneficial in the normalization process. Current software does not allow normalization by these additional characteristics, including class factors (i.e., biological sample classifier) or other important study design features (e.g., age, sex, and other patient characteristics). Herein, we examine the performance of a normalization procedure that accounts for study design features in the cohort study, using simulations studies and an environmentally exposed cohort. We compare these to the results of normalization with the standard options to determine the best methods to assess differentially expressed exposomic and metabolomic features in liver outcomes. We find similarities in some of these data but note differences in the outcomes based on normalization methods. Future studies may benefit from including known clinical and study design features in their analyses.

Keywords

Exposomics

Metagenomics

Cohort Studies

Normalization

Liver Disease 

Presenting Author

Christina Pinkston, University of Louisville and Biostats, Health Inform & Data Sci, University of Cincinnati College of Medicine

First Author

Christina Pinkston, University of Louisville and Biostats, Health Inform & Data Sci, University of Cincinnati College of Medicine

CoAuthor(s)

Shesh Rai, Biostats, Health Inform & Data Sci, University of Cincinnati College of Medicine
Matthew Cave, University of Louisville

Target Audience

Beginner

Tracks

Knowledge
Women in Statistics and Data Science 2025