Statistics in Modern Transcriptomics

Hui Jiang Chair
University of Michigan
 
Hui Jiang Organizer
University of Michigan
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
0592 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-101C 

Applied

Yes

Main Sponsor

Section on Statistics in Genomics and Genetics

Co Sponsors

International Chinese Statistical Association
Section on Statistical Learning and Data Science

Presentations

A unified model to detect spatially-variable and trajectory-preserved genes in spatial transcriptomics

Identifying spatially variable genes (SVGs) has been an essential task in spatial transcriptomics. In addition to SVGs detection, there are gene expressions showing developmental patterns or spatial trajectories across a tissue section. Identifying such genes could provide novel insights into tumor metastasis. Here, we introduce a unified statistical model to detect both types of genes. In addition, we propose a novel method to address the inherent double dipping problem commonly encountered when assessing temporal gene effect in single-cell sequencing studies. We demonstrate the testing performance through extensive simulation studies and through analyses of several publicly available datasets. Downstream analyses further highlight the potential of our method in identifying genes associated with tumor progression and enhancing domain detection. 

Speaker

Yuehua Cui, Michigan State University

Mining spatial -omics data with a spatially-aware high-dimensional regression method

Dissecting spatially varying relationships among features such as cell type interactions, gene regulatory networks, or microenvironmental cues, requires regression models that explicitly address the dual challenges of spatial dependency and high dimensionality. Traditional spatial regression methods often fail to balance spatial smoothness with feature sparsity, leading to overfit models or loss of interpretability in complex biological systems. To address this, we introduce Spatially Smooth Sparse Regression (S3R), a framework designed to resolve spatially coherent feature relationships through a unified regularization approach. S3R integrates (1) graph-guided spatial smoothing using minimum spanning trees (MSTs) to encode tissue topology, (2) L1/L2 penalties for individual and group-level sparsity, and (3) Adam optimizer-driven gradient descent for scalable high-dimensional optimization.
Applied to spatial transcriptomics data, S3R outperforms existing methods in accuracy and interpretability, recovering ground-truth coefficient and sparse feature sets. In biological contexts, S3R dissects feature relationships critical to tissue organization: it identifies layer-specific transcription factors in the human brain, macrophage-driven inflammatory response in infected skin, and collagen-mediated T cell exclusion in breast cancer stroma. The model further resolves spatially restricted ligand-receptor pairs in pancreatic tumors invisible to single-cell analyses.
By rigorously addressing the spatial regression problem, S3R empowers researchers to unravel spatially organized regulatory networks across development and disease domains. The method's open-source implementation enables scalable, interpretable analysis of 10x Visium, Xenium, and MERFISH datasets, bridging a critical gap in spatial omics. 

Speaker

Sha Cao, Indiana University, School of Medicine

BSNMani_ST: A Bayesian Model for Linking Spatial Transcriptomics Features to Patient Phenotypes at the Population Scale

Spatial transcriptomics (ST) provides valuable insights into molecular and spatial features of tissues, but associating ST data with patient phenotypes at the population scale is challenging. We introduce BSNMani_ST, a Bayesian scalar-on-network regression model with manifold learning, designed to predict clinical outcomes by linking ST features to population phenotypes in a scalable and interpretable manner. We applied BSNMani_ST to spatial transcriptomics data from the Seattle Alzheimer's Disease Brain Cell Atlas, as well as a single-cell imaging mass spectrometry dataset of breast cancers. BSNMani_ST identified biologically relevant gene co-expression subnetworks. These subnetworks are enriched for neurogenesis, neuronal communication, and signaling pathways in the Brain Cell Atlas data and immune-related antigens, cytokeratin, and hormone receptor antigens in the breast cancer data. We also performed simulations using synthetic datasets with latent subnetworks, BSNMani_ST outperformed other competing methods. These results underscore its robustness in capturing population-level patterns while incorporating clinical context.  

Co-Author

Lana Garmire

Speaker

Lana Garmire

Towards efficient integration of LLM gene embeddings to gene expression analysis

The integration of large language models with single-cell gene expression data has introduced a new type of data that includes a gene embedding matrix alongside the traditional gene expression matrix. This study addresses an important challenge of effectively merging these two data sources to enhance the definition of cell-to-cell distances. We identify a computationally feasible method that significantly improves the clustering of cells of the same type within real single-cell datasets. 

Speaker

Jun Li, University of Notre Dame

Sparse low rank models for cellular perturbation experiments

Large scale cellular perturbation experiments, including those enabled by CRISPR-based technologies, allow for high throughput single-cell transcriptomics experiments to measure cellular responses to biological perturbations. We identify several statistical challenges of these datasets including a high proportion of null effects and correlated effects across similar genes. To address these issues, we develop a sparse, low-rank modeling approach for improved estimation of cellular perturbation effects. Testing on simulated and real data, we compare to existing deep learning methods and linear regression to demonstrate the value of our linear matrix modeling approach. We also explore whether our linear approach can outperform nonlinear methods for predicting combinatorial effects.
 

Keywords

test 

Co-Author

Dylan Cable, University of Michigan

Speaker

Dylan Cable, University of Michigan