Print Close

Modeling continuous genetic ancestry to improve risk prediction across diverse populations

Presented During: Risk Scores and Polygenic Modeling of Genomic and Genetic Data

Haoyu Zhang Co-Author
National Cancer Institute

Rahul Mazumder Co-Author
Massachusetts Institute of Technology

Xihong Lin Co-Author
Harvard T.H. Chan School of Public Health

Tony Chen First Author
Harvard University

Tony Chen Presenting Author
Harvard University

Sunday, Aug 3: 4:05 PM - 4:20 PM
1153
Contributed Papers

Music City Center

Polygenic risk scores are widely used in disease risk stratification, but their accuracy varies across diverse populations. Recent methods large-scale leverage multi-ancestry data to improve accuracy in under-represented populations but require labelling individuals by ancestry for prediction. This poses challenges for practical use, as clinical practices are typically not based on ancestry. We propose SPLENDID, a novel penalized regression framework for diverse biobank-scale data. Our method utilizes ancestry principal component interactions to model genetic ancestry as a continuum within a single prediction model for all ancestries, eliminating the need for discrete labels. In extensive simulations and analyses of 9 traits from the All of Us Research Program (N=224,364) and UK Biobank (N=340,140), SPLENDID significantly outperformed existing methods in prediction accuracy and model sparsity. By directly modeling continuous genetic ancestry, SPLENDID stands as a valuable tool for robust risk prediction across diverse populations and fairer clinical implementation.

Keywords

polygenic risk scores

genetic ancestry

penalized regression

All of Us

UK Biobank

genetic interactions

Main Sponsor

Section on Statistics in Genomics and Genetics