Advanced Statistical and Machine Learning based methods for Spatial Biology

Frederick Boehm Chair
 
Satwik Acharyya Organizer
University of Alabama at Birmingham
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
0145 
Invited Paper Session 
Music City Center 
Room: CC-Davidson Ballroom B 

Keywords

Spatial transcriptomics

Spatial imaging

Machine learning

High-dimensional model 

Applied

Yes

Main Sponsor

Section on Statistics in Genomics and Genetics

Co Sponsors

Biometrics Section
International Society for Computational Biology

Presentations

Mitigating Data Double Dipping in Statistical Tests Post-Unsupervised Learning: The Role of Synthetic Null Data and Data Splitting Approaches in Single-Cell and Spatial Transcriptomics

In single-cell and spatial transcriptomic data analysis, unsupervised learning techniques such as clustering are commonly used to create new variables, which are then subjected to subsequent statistical testing for feature screening, such as identifying cell types and cell-type marker genes. However, this process can introduce the issue of data double dipping, where the same data is used to generate and test hypotheses, potentially leading to biased results. This talk will focus on five strategies to address this challenge:

1. Parallel Synthetic Null (Song et al., bioRxiv 2023): Using synthetic null data, such as knockoff data, in parallel with real data as a digital alternative.
2. Concatenated Synthetic Null (DenAdel et al., bioRxiv 2024): Concatenating synthetic null data with real data, a technique known as data augmentation.
3. Data Splitting: Dividing data either by observations or features to ensure independent testing.
4. Data Thinning (Neufeld et al., JMLR 2024): Splitting each data point into two independent data points.
5. Data Fission (Leiner et al., JASA 2023): Creating two independent data points from each original data point.

We will analyze the comparative advantages of these approaches, with a particular focus on their applications in single-cell and spatial transcriptomics data analysis. The analysis will focus on the trade-offs between false discovery rate and discovery power.
 

Keywords

Double dipping, single-cell data, spatial omics, data splitting 

Speaker

Jingyi Jessica Li, UCLA

Network models for Spatial Transcriptomics data

Network models are powerful tools to investigate complex dependence structures in high throughput genomic datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive understanding and coherent interpretations. However, most existing network or graphical models are developed under assumptions of homogeneity of samples and are not readily amenable to modeling spatial heterogeneity which often manifests in spatial genomics data. In this talk, I will discuss two spatial network models focusing on spatially varying covariance and precision matrices. (I) SpaceX (spatially dependent gene co-expression network) is a Bayesian methodology to identify both shared and cluster-specific co-expression networks across genes. (II) Spatial Graphical Regression (SGR) is a flexible approach based on graphical regression that enables spatially varying graphs over the spatial domain of the tissue. The framework incorporates multiple spatial covariates and provides a linear and non-linear functional mapping between the spatial domain and the precision matrices. All the approaches are illustrated by using case studies from cancer genomics. 

Keywords

Spatially varying graphs, spatial transcriptomics, graphical regression 

Speaker

Satwik Acharyya, University of Alabama at Birmingham

Spatial Immunophenotyping from Whole-Slide Multiplexed Tissue Imaging using Convolutional Neural Networks

The multiplexed immunofluorescence (mIF) platform enables biomarker discovery through simultaneous detection of multiple markers on a single tissue slide, offering detailed insights into intratumor heterogeneity and the tumor-immune microenvironment at spatially resolved single cell resolution. However, current mIF image analyses are labor-intensive, requiring specialized pathology expertise which limits their scalability and clinical application. To address this challenge, we developed CellGate, a deep-learning (DL) computational pipeline that provides streamlined, end-to-end whole-slide mIF image analysis including nuclei detection, cell segmentation, cell classification, and combined immuno-phenotyping across stacked images. The model was trained on over 750,000 single cell images from 34 melanomas in a retrospective cohort of patients using whole tissue sections stained for an immune marker panel with manual gating and extensive pathology review. When tested on new whole mIF slides, the model demonstrated high precision-recall AUC. Further validation on whole-slide mIF images of primary melanomas from an independent cohort confirmed that CellGate can reproduce expert pathology analysis with high accuracy. We show that spatial immuno-phenotyping results using CellGate provide deep insights into the immune cell topography and differences in T cell functional states and interactions with tumor cells in patients with distinct histopathology and clinical characteristics. This pipeline offers a fully automated and parallelizable computing process with substantially improved consistency for cell type classification across images, potentially enabling high throughput whole-slide mIF tissue image analysis for large-scale clinical and research applications. 

Keywords

Spatial immunophenotyping, convolutional neural networks, spatial imaging 

Speaker

Ronglai Shen, Memorial Sloan-Kettering Cancer Center

Unleashing the Potential of Spatial Transcriptomics: Statistical Innovations for Translational Research

Recent advances in single-cell and spatial technologies are revolutionizing our understanding of gene and protein functions at the cellular level within tissues. These breakthroughs can lead to new insights from patient samples, such as tumor biopsies, which are critical for selecting personalized cancer treatments and understanding treatment responses. However, these complex data present significant statistical challenges that must be addressed to fully harness their potential. In this talk, I will discuss our efforts in developing novel statistical methods to streamline data analysis and generate new translational insights. Additionally, I will highlight our involvement in large-scale data generation initiatives, including the MOSAIC project, which aim to create extensive resources with thousands of samples available for analysis. 

Keywords

Large scale spatial omics data, translational science, cancer biology 

Speaker

Raphael Gottardo, Fred Hutchinson Cancer Research Center