Tuesday, Aug 5: 2:00 PM - 3:50 PM
4137
Contributed Papers
Music City Center
Room: CC-207B
Main Sponsor
Section on Statistics in Genomics and Genetics
Presentations
One of the major challenges in spatial transcriptomics is to detect spatially variable genes (SVGs), whose expression patterns are non-random across tissue locations. Many SVGs correlate with cell type compositions, introducing the concept of cell type-specific SVGs (ctSVGs). Existing ctSVG detection methods treat cell type-specific spatial effects as fixed effects, leading to tissue spatial rotation-dependent results. Moreover, SVGs may exhibit random spatial patterns within cell types, meaning an SVG is not always a ctSVG, and vice versa, further complicating detection. We propose STANCE, a unified statistical model for both SVGs and ctSVGs detection under a linear mixed-effect model framework that integrates gene expression, spatial location, and cell type composition information. STANCE ensures tissue rotation-invariant results, with a two-stage approach: initial SVG/ctSVG detection followed by ctSVG-specific testing. We demonstrate its performance through extensive simulations and analyses of public datasets. Downstream analyses reveal STANCE's potential in spatial transcriptomics analysis.
Keywords
spatially variable genes
cell-type-specific spatially variable genes
spatial transcriptomics
spatial domain detection
Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it valuable for understanding complex tissue architectures like cancers. Seurat, the most popular ST analysis tool, uses the Wilcoxon rank-sum test by default for differential expression (DE) analysis. However, as a nonparametric method that disregards spatial correlations, the Wilcoxon test can lead to inflated false positive rates and misleading findings, highlighting the need for a more robust statistical approach.
We propose a Generalized Score Test (GST) in the Generalized Estimating Equations (GEE) framework as a robust solution for DE analysis in ST. By appropriately accounting for spatial correlations, extensive simulations showed that the GEE GST demonstrated superior Type I error control and comparable power relative to the Wilcoxon test and the GEE robust Wald test. Applications to ST datasets from breast and prostate cancer revealed that the GST-identified DE genes were predominantly enriched in pathways directly implicated in cancer progression, while the Wilcoxon test produced substantial false positives.
Keywords
Differential expression
GEE
Generalized score test
Spatial transcriptomics
Wilcoxon rank-sum test
Type I error
Co-Author(s)
Chenxuan Zang, Department of Biostatistics, The University of Texas MD Anderson Cancer Center
Ziyi Li, MD Anderson Cancer Center
Charles Guo, Department of Pathology, The University of Texas MD Anderson Cancer Center
Dejian Lai, University of Texas, Health Science Center At Houston
Peng Wei, University of Texas, MD Anderson Cancer Center
First Author
Yishan Wang
Presenting Author
Yishan Wang
Cell-type deconvolution methods has been a driving force for rapid development of spatial transcriptomics (ST) technologies in the past few years. Though reference-based deconvolution methods have been extensively studied, there is still a large demand for methodology development with reference-free deconvolution. STdeconvolve is one of the earliest ref-free deconvolution methods. However, it does not take spatial information into account, limiting its practical utility. Here we introduce a reference-free approach called SpatialDC for spatially informed cell-type deconvolution for ST. In our model, we encourage spatially close spots share similar cell types, leading to improved spatial deconvolution results. We evaluate our model on both simulated and real datasets generated from various ST technologies, including manually annotated dataset (MOB), 10X Visium, and DBiT-seq. The SpatialDC framework demonstrates robust performance in recovering accurate cell-type proportions and transcriptional profiles while effectively accounting for spatial correlations between pixels. This work presents statistical and computational advancements for analyzing complex spatial gene expression data.
Keywords
Spatial transcriptomics
Deconvolution
Reference-free
Latent Dirichlet Allocation (LDA)
Co-Author
Yuehua Cui, Michigan State University
First Author
Phuong Vo, Michigan State University
Presenting Author
Phuong Vo, Michigan State University
Recent technological advancements have made it possible to perform spatially resolved transcriptomic (SRT) profiling, which enhances our understanding of cell-cell communication within the context of tissues. However, current techniques require a compromise between experimental throughput and spatial resolution. Sequencing based technologies prioritize higher experimental throughput, resulting in multicellular pixel data. These datasets necessitate innovative computational methods to deconvolute cell types and avoid potential confounding issues within each pixel. Topic modeling methods, such as Latent Dirichlet Allocation (LDA), spatial LDA, and other statistical frameworks, provide a way to identify cell type composition from multicellular pixels. In this study, we evaluate several deconvolution approaches, assessing their effectiveness in capturing cell type distribution per pixel and gene expression distribution per cell type. Our analysis highlights the strengths and limitations of existing methods, offering guidance on best practices for analyzing multicellular pixel SRT data.
Keywords
Spatially resolved transcriptomic data
Multicellular pixel data
Cell type deconvolution
Topic mode
Latent Dirichlet Allocation
Cell-cell communication
Single-cell RNA sequencing (scRNA-seq) has advanced our understanding of biological systems, yet it fails to capture crucial components of the tissue transcriptome, such as neurite-localized transcripts and extracellular RNA. Spatial transcriptomics (ST) technologies offer an alternative by capturing transcript locations without tissue dissociation. However, existing approaches—such as cell type deconvolution and cell segmentation—primarily aim to recover single-cell-level information, overlooking the residual transcriptome: mRNAs that are either not captured by scRNA-seq or not assigned to any segmented cells in ST data. To address these limitations, we introduce RESCUE, a novel statistical framework that fully partitions gene expression data into contributions from known reference factors and the residual transcriptome. We formulate the problem as a penalized robust regression with a sparse mean-shift parameterization. To account for gene-specific variability, we employ iteratively reweighted adaptive Lasso-type weights. An efficient simulation-based surrogate matching pursuit algorithm is developed for the tuning procedure. Our results demonstrate that RESCUE outperforms existing methods in accurately decomposing ST data and recovers biologically meaningful signals that were previously overlooked. By fully leveraging the unbiased nature of ST data, RESCUE provides a more comprehensive view of transcriptomic organization both within and beyond cell bodies.
Keywords
Spatial transcriptomics
Single-cell RNA sequencing
Sparse recovery
Robust estimation
Regularized multivariate regression
Co-Author(s)
Seokjin Yeo, University of Illinois at Urbana-Champaign
Alex Schrader, University of Illinois at Urbana-Champaign
Ian Traniello, Princeton University
Amy Cash Ahmed, University of Illinois at Urbana-Champaign
Gene Robinson, University of Illinois at Urbana-Champaign
Hee-Sun Han, University of Illinois at Urbana-Champaign
Sihai Dave Zhao, University of Illinois at Urbana-Champaign
First Author
Young Joo Lee, Department of Statistics, University of Illinois at Urbana-Champaign
Presenting Author
Young Joo Lee, Department of Statistics, University of Illinois at Urbana-Champaign
We propose a novel statistical framework for simultaneously clustering and deconvoluting spatially resolved transcriptomic (SRT) data. Specifically, we propose an estimation criterion that can identify clusters of spatial spots, while also providing estimates of the cell-type compositions for each cluster. Our approach formulates the clustering problem as a well-posed optimization, minimizing the proposed criterion that incorporates spatial structure and cellular heterogeneity. This is solved efficiently using a block coordinate descent algorithm, where each subproblem is convex. To ensure robust and data-driven model selection, we introduce a new strategy for parameter tuning, alongside a novel post-clustering inference framework. This framework addresses challenges like inflated Type I error rates, enabling valid hypothesis testing on the identified regions, providing a statistically rigorous basis for downstream analysis. Extensive simulation studies and real data applications demonstrate that our method significantly outperforms existing competitors, offering a scalable, interpretable, and reliable tool for analyzing complex SRT data.
Keywords
Clustering
Deconvolution
Spatial Transcriptomics
Post clustering inference
Optimization
Co-Author
Aaron Molstad, University of Minnesota
First Author
Hyun Jung Koo, School of Statistics, University of Minnesota - Twin Cities
Presenting Author
Hyun Jung Koo, School of Statistics, University of Minnesota - Twin Cities