14. Evaluating variable selection methods to detect DNA methylation patterns predictive of longitudinal outcomes: The impact of ignoring within-subject correlation

Conference: Women in Statistics and Data Science 2025
11/13/2025: 2:30 PM - 4:00 PM EST
Speed 

Description

DNA methylation (DNAm) is a promising biomarker for quantifying biological aging and has been shown to be predictive of various health outcomes, including mortality, frailty, and chronic disease risk. However, its high dimensionality and longitudinal measurement structure present challenges for variable selection and model interpretation. This study assesses the performance of three penalized regression methods-LASSO with minimum cross-validation error (LASSO-min), LASSO with the one-standard-error rule (LASSO-1se), and the Sparse Penalized Linear Mixed Model (SPLMM)-using simulation data that emulate key features of longitudinal DNAm studies, such as random slopes and missing data. LASSO methods assume independence across observations and do not account for within-subject correlation, whereas SPLMM incorporates random effects to explicitly model such dependencies. Performance was evaluated using sensitivity, specificity, mean squared error, and the Matthews correlation coefficient. Simulation-based results suggest that LASSO-1se achieves a desirable balance between predictive accuracy and model sparsity, even in the presence of moderate serial correlation. While SPLMM appropriately models within-subject correlation, its computational burden is substantially higher and does not yield consistent gains. Real data applications will consider the selection of DNAm patterns to predict longitudinal outcomes, including settings subject to loss to follow-up.

Keywords

Variable selection for correlated data

Longitudinal data

Epigenetics

Penalized regression

Simulation study 

Presenting Author

Wenjing Liu

First Author

Wenjing Liu

CoAuthor

Fernanda Schumacher, Ohio State University

Target Audience

Mid-Level

Tracks

Knowledge
Women in Statistics and Data Science 2025