Contributed Poster Presentations: Korean International Statistical Society

Shirin Golchi Chair
McGill University
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4166 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Korean International Statistical Society

Presentations

23: Comparative Evaluation of Statistical Learning Methods for Polygenic Prediction in UK Biobank

Accurate prediction of complex traits and diseases, is crucial for advancing personalized medicine and preventive healthcare. However, guidelines for selecting optimal PRS models under diverse conditions remain limited, often leaving researchers to rely on generalized assumptions rather than tailored methodologies. In this study, we systematically evaluated PRS prediction models using both simulation-based experiments and real-world datasets, including height, BMI, T2D, and glaucoma. We compared various PRS models across key factors: (1) trait heritability, (2) number of SNPs, (3) proportion of causal SNPs, and (4) trait prevalence. Our results highlight key distinctions between infinitesimal models, which assume all SNPs contribute to traits, and non-infinitesimal models, which consider only a subset of SNPs as causal. Specifically, we demonstrate that non-infinitesimal models, such as LDpred and PRScs, outperform infinitesimal models when the proportion of causal SNPs is low-a characteristic common to many phenotypes. Additionally, sample size was a critical determinant of performance, with LDpred excelling in smaller datasets and PRScs outperforming LDpred in larger datasets. 

Keywords

Polygenic risk score (PRS)

PRS models

Computational tools

UK Biobank 

Co-Author

Seunghwan Park, Soongsil University

First Author

Wonil Chung, Soongsil University

Presenting Author

Wonil Chung, Soongsil University

24: High-dimensional Partial Linear Model with Trend Filtering

Understanding the links between diet, metabolic changes, and health outcomes is a key focus in nutritional science and broader biological research. Analyzing relationships, such as those between ultra-processed food (UPF) intake and metabolites, offers insights into potential biomarkers for diet-related diseases and public health applications. However, these analyses are challenging due to high-dimensional data structures and complex, often nonlinear associations between covariates and health outcomes. Traditional linear models and conventional nonparametric methods often lack the flexibility to accurately capture such complexities in biological data. To address these challenges, we propose a high-dimensional partial linear regression model that captures both linear and nonlinear effects, combining the interpretability of linear models with the adaptability of nonparametric approaches. Our model leverages trend filtering to handle local smoothness variations effectively and achieves minimax optimal rates, making it suitable for complex biological datasets. We apply this model to data from the Interactive Diet and Activity Tracking in AARP (IDATA) Study, demonstrating its utility. 

Keywords

High-dimensional data analysis

Partial linear models

Trend filtering

Ultra-processed food biomarkers 

Co-Author(s)

Erikka Loftfield, National Cancer Institute
Hyokyoung Hong, NIH
Haolei Weng, Michigan State University

First Author

Sang Kyu Lee, National Cancer Institute

Presenting Author

Sang Kyu Lee, National Cancer Institute

25: Penalized Maximum Likelihood Estimation in Latent Class Analysis

Latent Class Analysis (LCA) is a widely used model-based clustering technique for discrete data. However, including irrelevant variables in an LCA model can significantly impact its efficiency and reliability, as seen in many statistical models. Traditional variable selection methods in LCA often rely on stepwise algorithms, which can be computationally intensive and suboptimal. In this study, we reformulate the LCA model as a log-linear model and apply penalized maximum likelihood estimation to achieve simultaneous parameter estimation and variable selection. Through numerical studies, we compare our approach with existing methods and demonstrate its effectiveness using a real dataset. 

Keywords

Latent Class Analysis

Variable selection

Penalized maximum likelihood

Expectation-Maximization Algorithm 

Co-Author

Byungtae Seo, Sungkyunkwan University

First Author

Jimin Park

Presenting Author

Jimin Park