Wednesday, Aug 6: 10:30 AM - 12:20 PM
4166
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Korean International Statistical Society
Presentations
Accurate prediction of complex traits and diseases, is crucial for advancing personalized medicine and preventive healthcare. However, guidelines for selecting optimal PRS models under diverse conditions remain limited, often leaving researchers to rely on generalized assumptions rather than tailored methodologies. In this study, we systematically evaluated PRS prediction models using both simulation-based experiments and real-world datasets, including height, BMI, T2D, and glaucoma. We compared various PRS models across key factors: (1) trait heritability, (2) number of SNPs, (3) proportion of causal SNPs, and (4) trait prevalence. Our results highlight key distinctions between infinitesimal models, which assume all SNPs contribute to traits, and non-infinitesimal models, which consider only a subset of SNPs as causal. Specifically, we demonstrate that non-infinitesimal models, such as LDpred and PRScs, outperform infinitesimal models when the proportion of causal SNPs is low-a characteristic common to many phenotypes. Additionally, sample size was a critical determinant of performance, with LDpred excelling in smaller datasets and PRScs outperforming LDpred in larger datasets.
Keywords
Polygenic risk score (PRS)
PRS models
Computational tools
UK Biobank
Understanding the links between diet, metabolic changes, and health outcomes is a key focus in nutritional science and broader biological research. Analyzing relationships, such as those between ultra-processed food (UPF) intake and metabolites, offers insights into potential biomarkers for diet-related diseases and public health applications. However, these analyses are challenging due to high-dimensional data structures and complex, often nonlinear associations between covariates and health outcomes. Traditional linear models and conventional nonparametric methods often lack the flexibility to accurately capture such complexities in biological data. To address these challenges, we propose a high-dimensional partial linear regression model that captures both linear and nonlinear effects, combining the interpretability of linear models with the adaptability of nonparametric approaches. Our model leverages trend filtering to handle local smoothness variations effectively and achieves minimax optimal rates, making it suitable for complex biological datasets. We apply this model to data from the Interactive Diet and Activity Tracking in AARP (IDATA) Study, demonstrating its utility.
Keywords
High-dimensional data analysis
Partial linear models
Trend filtering
Ultra-processed food biomarkers
Latent Class Analysis (LCA) is a widely used model-based clustering technique for discrete data. However, including irrelevant variables in an LCA model can significantly impact its efficiency and reliability, as seen in many statistical models. Traditional variable selection methods in LCA often rely on stepwise algorithms, which can be computationally intensive and suboptimal. In this study, we reformulate the LCA model as a log-linear model and apply penalized maximum likelihood estimation to achieve simultaneous parameter estimation and variable selection. Through numerical studies, we compare our approach with existing methods and demonstrate its effectiveness using a real dataset.
Keywords
Latent Class Analysis
Variable selection
Penalized maximum likelihood
Expectation-Maximization Algorithm