Sunday, Aug 3: 4:00 PM - 5:50 PM
4016
Contributed Papers
Music City Center
Room: CC-209B
Main Sponsor
Section on Statistics in Genomics and Genetics
Presentations
Gaussian graphical models are widely used to construct networks for analyzing associations among biological features e.g. gene expression, microbial taxa, and metabolites. However, there is no general statistical framework for investigating how genomic factors influence these networks, particularly when the number of candidate regulators is large. In this work, we propose an efficient algorithm to identify high-dimensional genomic factors associated with biological networks. Our two-step procedure first constructs a base network without incorporating genomic factors and then identifies genomic factors that modify edges of the inferred network. Also, we develop a permutation-based approach for accurate false discovery rate control. We illustrate the utility of our method through three applications: (i) identifying host genetic variants that regulate the oral microbiome network in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, (ii) detecting metagenomic features that influence the gut metabolite network in colorectal cancer, and (iii) mapping somatic mutations that regulate gene expression networks in lung adenocarcinoma using data from The Cancer Genome Atlas.
Keywords
Genomics
Gaussian graphical models
Statistical Genetics
Microbiome
Networks
High-dimensional data analysis
Understanding how gene networks vary across spatial regions, conditions, and cell types is essential for decoding tissue organization and disease mechanisms. Spatial Transcriptomics (ST) technologies provide gene expression data with spatial context, but estimating gene-gene correlations remains challenging due to spatial autocorrelation among cells, which can produce spurious associations and obscure true biological relationships. Existing methods often ignore spatial structure or lack scalability for single-cell resolution data. In this project, we present a flexible and efficient approach for estimating gene correlation networks from single-cell ST data by first removing spatial correlation. This preprocessing step enables the use of standard co-expression and network inference methods while avoiding spatial confounding. Our approach improves reproducibility, reveals biologically meaningful associations, and facilitates robust comparisons of gene networks across tissue regions, cell types, and experimental conditions.
Keywords
Spatial transcriptomics
co-expression
Splines
Associated with high-dimensional omics data there are often "meta-features" such as biological pathways and functional annotations that can be informative for predicting an outcome of interest. We introduce a regularized hierarchical framework for integrating meta-features, with the goal of improving prediction and feature selection performance with time-to-event outcomes.
A hierarchical framework is deployed to incorporate meta-features. Regularization is applied to the omic features as well as the meta-features so that high-dimensional data can be handled at both levels. The proposed hierarchical Cox model can be efficiently fitted by a combination of iterative reweighted least squares and cyclic coordinate descent.
Simulations show that when the external meta-features are informative, the regularized hierarchical model can substantially improve prediction performance over standard regularized Cox regression. We illustrate the proposed model with applications to breast cancer and melanoma survival based on gene expression profiles, which show the improvement in prediction performance by applying meta-features, as well as the discovery of important omic feature sets.
Keywords
integrated analysis
omics
meta-feature
precision medicine
regularized regression
hierarchical model
Identifying cell-type-specific expression quantitative trait loci (eQTLs) is important to understanding the genetic regulation of gene expressions at the cell-type level and its relevance to complex traits. However, existing eQTL fine-mapping methods are limited in power and accuracy when cell types are analyzed separately. To improve eQTL mapping, we present CASE, a Bayesian framework to perform Cell-type-specific And Shared EQTL fine-mapping that simultaneously analyzes multiple cell types. CASE can effectively capture effect-sharing patterns across cell types while disentangling the confounding effects of linkage disequilibrium (LD). We demonstrate that CASE outperforms the existing single-trait (SuSiE) and multi-trait (mvSuSiE) eQTL methods through comprehensive simulations. When applied to the OneK1K data, CASE identified more genetic regulations of gene expressions, better capturing cell type specificity and functionally enriched and disease-associated eQTLs. The CASE framework for single-cell eQTL fine-mapping can be broadly applied to multi-tissue and multi-trait genetic studies.
Keywords
Empirical Bayes
MCEM
eQTL
causal inference
Rare variants (RVs) play a key role in complex disease genetics. Advances in WGS/WES have facilitated the identification of RV associations. However, RV analysis faces challenges in statistical power due to low allele frequencies. The issues are compounded for time-to-event (TTE) phenotypes, where high censoring rates and population structure can violate assumptions of standard association tests.
Here we present GATE-STAAR, an efficient and accurate framework for RV association tests of TTE phenotypes. We extend burden test, SKAT, and ACAT for TTE phenotypes, and use saddlepoint and Gamma approximations to calibrate test statistics under extreme censorings. Functional annotations are integrated to improve statistical power. The method accounts for population structure while maintaining computational scalability for large biobank-scale datasets.
Through extensive simulations, we demonstrate that GATE-STAAR substantially improves power with type I error well-controlled. Applied to 500K UKBB WGS data, we identified novel signals with implications for disease onset and progression. The findings highlight the promise of RV analyses for advancing our understanding of disease etiology.
Keywords
rare variants
time-to-event phenotypes
biobank studies
saddlepoint approximation
functional annotations
SKAT
Co-Author(s)
Xihao Li, University of North Carolina at Chapel Hill
Hufeng Zhou, Harvard University
Zilin Li, Northeast Normal University
Xihong Lin, Harvard T.H. Chan School of Public Health
First Author
Shuang Song, Harvard T.H. Chan School of Public Health
Presenting Author
Shuang Song, Harvard T.H. Chan School of Public Health
Rare variant association testing presents challenges due to sparcity and the complex nature of genetic non-linear interactions. SKAT and similar methods struggle to fully capture multi-scale patterns in these type genetic variations. We introduce WaveKAT-F, a novel wavelet-Fourier framework that applies custom-designed wavelet transformation with adaptive weighting and a specialized kernel tailored for genetic signal processing to enhance rare variant detection. By transforming genotype data into a multi-resolution representation, WaveKAT-F improves signal extraction while preserving both localized and global effects. Simulations and real-world datasets demonstrate that WaveKAT-F achieves higher power than existing methods, particularly in scenarios with mixed effect directions with weak association and low-frequency variants, while maintaining well-controlled Type I error rates. By integrating custom wavelet transforms, Fourier-based spectral analysis, and a unique kernel, WaveKAT-F provides a robust and flexible approach for identifying rare variant associations.
Keywords
Rare Variant Association Testing
Custom Wavelet Transforms
Fourier Methods
Spectral Methods
Multi-Resolution Signal Processing
Genetic Association Studies