Wednesday, Aug 6: 2:00 PM - 3:50 PM
0592
Topic-Contributed Paper Session
Music City Center
Room: CC-101C
Applied
Yes
Main Sponsor
Section on Statistics in Genomics and Genetics
Co Sponsors
International Chinese Statistical Association
Section on Statistical Learning and Data Science
Presentations
Identifying spatially variable genes (SVGs) has been an essential task in spatial transcriptomics. In addition to SVGs detection, there are gene expressions showing developmental patterns or spatial trajectories across a tissue section. Identifying such genes could provide novel insights into tumor metastasis. Here, we introduce a unified statistical model to detect both types of genes. In addition, we propose a novel method to address the inherent double dipping problem commonly encountered when assessing temporal gene effect in single-cell sequencing studies. We demonstrate the testing performance through extensive simulation studies and through analyses of several publicly available datasets. Downstream analyses further highlight the potential of our method in identifying genes associated with tumor progression and enhancing domain detection.
Dissecting spatially varying relationships among features such as cell type interactions, gene regulatory networks, or microenvironmental cues, requires regression models that explicitly address the dual challenges of spatial dependency and high dimensionality. Traditional spatial regression methods often fail to balance spatial smoothness with feature sparsity, leading to overfit models or loss of interpretability in complex biological systems. To address this, we introduce Spatially Smooth Sparse Regression (S3R), a framework designed to resolve spatially coherent feature relationships through a unified regularization approach. S3R integrates (1) graph-guided spatial smoothing using minimum spanning trees (MSTs) to encode tissue topology, (2) L1/L2 penalties for individual and group-level sparsity, and (3) Adam optimizer-driven gradient descent for scalable high-dimensional optimization.
Applied to spatial transcriptomics data, S3R outperforms existing methods in accuracy and interpretability, recovering ground-truth coefficient and sparse feature sets. In biological contexts, S3R dissects feature relationships critical to tissue organization: it identifies layer-specific transcription factors in the human brain, macrophage-driven inflammatory response in infected skin, and collagen-mediated T cell exclusion in breast cancer stroma. The model further resolves spatially restricted ligand-receptor pairs in pancreatic tumors invisible to single-cell analyses.
By rigorously addressing the spatial regression problem, S3R empowers researchers to unravel spatially organized regulatory networks across development and disease domains. The method's open-source implementation enables scalable, interpretable analysis of 10x Visium, Xenium, and MERFISH datasets, bridging a critical gap in spatial omics.
Speaker
Sha Cao, Indiana University, School of Medicine
Spatial transcriptomics (ST) provides valuable insights into molecular and spatial features of tissues, but associating ST data with patient phenotypes at the population scale is challenging. We introduce BSNMani_ST, a Bayesian scalar-on-network regression model with manifold learning, designed to predict clinical outcomes by linking ST features to population phenotypes in a scalable and interpretable manner. We applied BSNMani_ST to spatial transcriptomics data from the Seattle Alzheimer's Disease Brain Cell Atlas, as well as a single-cell imaging mass spectrometry dataset of breast cancers. BSNMani_ST identified biologically relevant gene co-expression subnetworks. These subnetworks are enriched for neurogenesis, neuronal communication, and signaling pathways in the Brain Cell Atlas data and immune-related antigens, cytokeratin, and hormone receptor antigens in the breast cancer data. We also performed simulations using synthetic datasets with latent subnetworks, BSNMani_ST outperformed other competing methods. These results underscore its robustness in capturing population-level patterns while incorporating clinical context.
The integration of large language models with single-cell gene expression data has introduced a new type of data that includes a gene embedding matrix alongside the traditional gene expression matrix. This study addresses an important challenge of effectively merging these two data sources to enhance the definition of cell-to-cell distances. We identify a computationally feasible method that significantly improves the clustering of cells of the same type within real single-cell datasets.
Speaker
Jun Li, University of Notre Dame
Large scale cellular perturbation experiments, including those enabled by CRISPR-based technologies, allow for high throughput single-cell transcriptomics experiments to measure cellular responses to biological perturbations. We identify several statistical challenges of these datasets including a high proportion of null effects and correlated effects across similar genes. To address these issues, we develop a sparse, low-rank modeling approach for improved estimation of cellular perturbation effects. Testing on simulated and real data, we compare to existing deep learning methods and linear regression to demonstrate the value of our linear matrix modeling approach. We also explore whether our linear approach can outperform nonlinear methods for predicting combinatorial effects.
Keywords
test