Print Close

Variable selection with FDR control for noisy data – an application to screening metabolites that are associated with breast and colorectal cancer

Presented During: Methods Research in the Women's Health Initiative: Addressing Bias, Messy Data, and Multiplicity Issues in Cohort Studies

Runqiu Wang Speaker

Thursday, Aug 8: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session

Oregon Convention Center

The rapidly expanding field of metabolomics presents an invaluable resource for understanding the associations between metabolites and various diseases. However, the high dimensionality, presence of missing values, and measurement errors associated with metabolomics data can present challenges in developing reliable and reproducible approaches for disease association studies. Therefore, there is a compelling need for robust statistical analyses that can navigate these complexities to achieve reliable and reproducible disease association studies. In this paper, we construct algorithms to perform variable selection for noisy data and control the False Discovery Rate when selecting mutual metabolomic predictors for multiple disease outcomes. We illustrate the versatility and performance of this procedure in various scenarios, dealing with missing data and measurement errors. By applying our method to the Women's Health Initiative data, we successfully identify metabolites that are associated with either or both of breast cancer and colorectal cancer, demonstrating the practical utility of our method in identifying consistent risk factors and understanding shared disease mechanisms.