Statistical Challenges with High Throughput Outcomes in Environmentally Exposed Population Cohorts

Shesh Rai Co-Author
University of Louisville
 
Matthew Cave Co-Author
University of Louisville
 
Christina Pinkston First Author
University of Louisville and Biostats, Health Inform & Data Sci, University of Cincinnati College of Medicine
 
Christina Pinkston Presenting Author
University of Louisville and Biostats, Health Inform & Data Sci, University of Cincinnati College of Medicine
 
Thursday, Aug 7: 11:35 AM - 11:50 AM
2019 
Contributed Papers 
Music City Center 
Cohort studies involving analyses of high throughput (HTP) data will become more prevalent with reduced cost barriers of these technologies. Due to differences in their study designs, observational studies, including cohort studies may face statistical challenges not found in randomized studies. For instance, cohort studies may include multiple groups (i.e., >2), heterogeneity within the groups, unequal sample sizes, or they may lack appropriate power and sample size to detect differences in HTP measures. We thoroughly review the challenges and issues faced when associating HTP data (e.g., miRNAs, exposomics) to disease status with an application to discovery of liver disease biomarkers in a residential cohort exposed to environmental toxins. Among others, we identified as potential issues heterogeneity, inappropriate data dimensionality reduction, improper handling of multiple testing, inadequate HTP data pre-processing procedures, unmet model assumptions, and misidentified biological relevance as barriers to a properly executed biomarker discovery analysis in a population cohort. Correctly accounting for these factors should result in more robust unbiased statistical findings.

Keywords

high throughput studies

biomarkers

heterogeneity

cohort studies

liver disease

environmental exposures 

Main Sponsor

Section on Statistics in Genomics and Genetics