Advanced Computational and Graphical Methods in Genomics and Genetics

Jie Ren Chair
Indiana University School of Medicine
 
Sunday, Aug 3: 4:00 PM - 5:50 PM
4016 
Contributed Papers 
Music City Center 
Room: CC-209B 

Main Sponsor

Section on Statistics in Genomics and Genetics

Presentations

Identifying high-dimensional genomic factors associated with biological networks

Gaussian graphical models are widely used to construct networks for analyzing associations among biological features e.g. gene expression, microbial taxa, and metabolites. However, there is no general statistical framework for investigating how genomic factors influence these networks, particularly when the number of candidate regulators is large. In this work, we propose an efficient algorithm to identify high-dimensional genomic factors associated with biological networks. Our two-step procedure first constructs a base network without incorporating genomic factors and then identifies genomic factors that modify edges of the inferred network. Also, we develop a permutation-based approach for accurate false discovery rate control. We illustrate the utility of our method through three applications: (i) identifying host genetic variants that regulate the oral microbiome network in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, (ii) detecting metagenomic features that influence the gut metabolite network in colorectal cancer, and (iii) mapping somatic mutations that regulate gene expression networks in lung adenocarcinoma using data from The Cancer Genome Atlas. 

Keywords

Genomics

Gaussian graphical models

Statistical Genetics

Microbiome

Networks

High-dimensional data analysis 

Co-Author(s)

Jianxin Shi
Paul Albert, National Cancer Institute

First Author

Samuel Anyaso-Samuel, National Cancer Institute

Presenting Author

Samuel Anyaso-Samuel, National Cancer Institute

Accounting for Spatial Correlation in Graphical Analysis of Spatial Transcriptomics Data

Understanding how gene networks vary across spatial regions, conditions, and cell types is essential for decoding tissue organization and disease mechanisms. Spatial Transcriptomics (ST) technologies provide gene expression data with spatial context, but estimating gene-gene correlations remains challenging due to spatial autocorrelation among cells, which can produce spurious associations and obscure true biological relationships. Existing methods often ignore spatial structure or lack scalability for single-cell resolution data. In this project, we present a flexible and efficient approach for estimating gene correlation networks from single-cell ST data by first removing spatial correlation. This preprocessing step enables the use of standard co-expression and network inference methods while avoiding spatial confounding. Our approach improves reproducibility, reveals biologically meaningful associations, and facilitates robust comparisons of gene networks across tissue regions, cell types, and experimental conditions. 

Keywords

Spatial transcriptomics

co-expression

Splines 

Co-Author

Ali Shojaie, University of Washington

First Author

Ana Gabriela Vasconcelos

Presenting Author

Ana Gabriela Vasconcelos

A Regularized Hierarchical Model for Incorporating Annotation Information in Predictive Omic Studies

Associated with high-dimensional omics data there are often "meta-features" such as biological pathways and functional annotations that can be informative for predicting an outcome of interest. We introduce a regularized hierarchical framework for integrating meta-features, with the goal of improving prediction and feature selection performance with time-to-event outcomes.
A hierarchical framework is deployed to incorporate meta-features. Regularization is applied to the omic features as well as the meta-features so that high-dimensional data can be handled at both levels. The proposed hierarchical Cox model can be efficiently fitted by a combination of iterative reweighted least squares and cyclic coordinate descent.
Simulations show that when the external meta-features are informative, the regularized hierarchical model can substantially improve prediction performance over standard regularized Cox regression. We illustrate the proposed model with applications to breast cancer and melanoma survival based on gene expression profiles, which show the improvement in prediction performance by applying meta-features, as well as the discovery of important omic feature sets. 

Keywords

integrated analysis

omics

meta-feature

precision medicine

regularized regression

hierarchical model 

Co-Author(s)

Juan Pablo Lewinger, USC
Eric Kawaguchi

First Author

Dixin Shen, Gilead Sciences

Presenting Author

Dixin Shen, Gilead Sciences

Leveraging cell-type specificity and similarity improves single-cell eQTL fine-mapping

Identifying cell-type-specific expression quantitative trait loci (eQTLs) is important to understanding the genetic regulation of gene expressions at the cell-type level and its relevance to complex traits. However, existing eQTL fine-mapping methods are limited in power and accuracy when cell types are analyzed separately. To improve eQTL mapping, we present CASE, a Bayesian framework to perform Cell-type-specific And Shared EQTL fine-mapping that simultaneously analyzes multiple cell types. CASE can effectively capture effect-sharing patterns across cell types while disentangling the confounding effects of linkage disequilibrium (LD). We demonstrate that CASE outperforms the existing single-trait (SuSiE) and multi-trait (mvSuSiE) eQTL methods through comprehensive simulations. When applied to the OneK1K data, CASE identified more genetic regulations of gene expressions, better capturing cell type specificity and functionally enriched and disease-associated eQTLs. The CASE framework for single-cell eQTL fine-mapping can be broadly applied to multi-tissue and multi-trait genetic studies. 

Keywords

Empirical Bayes

MCEM

eQTL

causal inference 

Co-Author(s)

Yingxin Lin
Wenxuan Li, UCLA
Leqi Xu
Xiangyu Zhang, Yale University
Hongyu Zhao, Yale University

First Author

Chen Lin

Presenting Author

Chen Lin

Efficient and accurate framework for rare variant associations in biobank-scale time-to-event data

Rare variants (RVs) play a key role in complex disease genetics. Advances in WGS/WES have facilitated the identification of RV associations. However, RV analysis faces challenges in statistical power due to low allele frequencies. The issues are compounded for time-to-event (TTE) phenotypes, where high censoring rates and population structure can violate assumptions of standard association tests.
Here we present GATE-STAAR, an efficient and accurate framework for RV association tests of TTE phenotypes. We extend burden test, SKAT, and ACAT for TTE phenotypes, and use saddlepoint and Gamma approximations to calibrate test statistics under extreme censorings. Functional annotations are integrated to improve statistical power. The method accounts for population structure while maintaining computational scalability for large biobank-scale datasets.
Through extensive simulations, we demonstrate that GATE-STAAR substantially improves power with type I error well-controlled. Applied to 500K UKBB WGS data, we identified novel signals with implications for disease onset and progression. The findings highlight the promise of RV analyses for advancing our understanding of disease etiology. 

Keywords

rare variants

time-to-event phenotypes

biobank studies

saddlepoint approximation

functional annotations

SKAT 

Co-Author(s)

Xihao Li, University of North Carolina at Chapel Hill
Hufeng Zhou, Harvard University
Zilin Li, Northeast Normal University
Xihong Lin, Harvard T.H. Chan School of Public Health

First Author

Shuang Song, Harvard T.H. Chan School of Public Health

Presenting Author

Shuang Song, Harvard T.H. Chan School of Public Health

WITHDRAWN WaveKAT-F: A Genotype-Informed Wavelet-Fourier Transformation for Rare Variant Association Testing

Rare variant association testing presents challenges due to sparcity and the complex nature of genetic non-linear interactions. SKAT and similar methods struggle to fully capture multi-scale patterns in these type genetic variations. We introduce WaveKAT-F, a novel wavelet-Fourier framework that applies custom-designed wavelet transformation with adaptive weighting and a specialized kernel tailored for genetic signal processing to enhance rare variant detection. By transforming genotype data into a multi-resolution representation, WaveKAT-F improves signal extraction while preserving both localized and global effects. Simulations and real-world datasets demonstrate that WaveKAT-F achieves higher power than existing methods, particularly in scenarios with mixed effect directions with weak association and low-frequency variants, while maintaining well-controlled Type I error rates. By integrating custom wavelet transforms, Fourier-based spectral analysis, and a unique kernel, WaveKAT-F provides a robust and flexible approach for identifying rare variant associations. 

Keywords

Rare Variant Association Testing

Custom Wavelet Transforms

Fourier Methods

Spectral Methods

Multi-Resolution Signal Processing

Genetic Association Studies 

Co-Author

Victor Petrescu

First Author

Victor Petrescu